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TITLE 



"NUCLEIC ACID SEQUENCE AND METHOD FOR SELECTIVELY 
EXPRESSING A PROTEIN IN A TARGET CELL OR TISSUE" 

5 

fie;ld of the invention 

THIS INVENTION relates generally to gene 
therapy. More particularly, the present invention 

10 relates to a synthetic nucleic acid sequence and to a 
method for selectively expressing a protein in a 
target cell or tissue in which at least one existing 
codon of a parent nucleic acid sequence encoding the 
protein has been replaced with a synonymous codon. 

15 The invention also relates to production of virus 
particles using one or more synthetic nucleic acid 
sequences and the method according to the invention. 



20 



BACKGROUND OF THE INVENTION 



While gene therapy is of great clinical 
interest for treatment of gene defects, this therapy 
has not entered into mainstream clinical practice 
because selective delivery of genes to target tissues 

25 has proven extremely difficult. Currently, viral 

vectors are used, particularly retroviruses and 
adenovirus, which are to some extent selective. 
However, many vector systems are by their nature 
unable to produce stable integrants and some also 

3 0 invoke immune responses thereby preventing effective 
treatment. Alternatively, "naked" DNA is packaged in 
liposomes or other similar delivery systems. A major 
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problem to be overcome is that such gene delivery- 
systems themselves are not tissue selective, whereas 
selective targeting of genes to particular tissues 
would be desirable for many disorders (e.gr., cancer 
5 therapy) . While use of tissue specific promoters to 
target gene therapy has been effective in some animal 
models it has proven less so in man, and selective 
tissue specific promoters are not available for a wide 
range of tissues. 

10 The current invention has arisen 

unexpectedly from recent investigations exploring why 
papillomavirus (PV) late gene expression is restricted 
to differentiated keratinocytes. In this regard, it 
is known that PV late genes LI and L2 are only 

15 expressed in non-dividing differentiated keratinocytes 
(KCs) . Many investigators including the. present 

inventors have been unable- to detect significant PV LI 
and L2 protein expression when these genes are 
transduced or transfected into undifferentiated 

20 cultured cells, using a range of conventional 
constitutive viral promoters including retroviral long 
terminal repeats (LTRs) and the strong constitutive 
promoters of CMV and SV40. 

PV LI mRNA can however be efficiently 

25 translated in vitro using rabbit reticulocyte cell 
lysate, suggesting that there are no cellular 
inhibitors in the lysate interfering with translation 
of LI. The major difference between the in vitro and 
in vivo translation systems is that LI comprises the 

3 0 dominant LI mRNA in in vitro translation reactions, 
while it constitutes a minor fraction among the 
cellular mRNAs in intact cells. 
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In vivo, PV late proteins are not produced 
in undifferentiated KC. However, they are expressed 
in large quantity in highly differentiated KC. The 
mechanism of this tight control of late gene 
5 expression has been poorly understood, and searches by 
many groups for KC specific PV gene transcriptional 
control proteins have been unrewarding. 

Blockage to translation of LI mRNA in vivo 
has been attributed to sequences within the LI ORF 
- 10 (Tan et al . 1995, J". Virol. 69 5607-5620; Tan and 

Schwartz, 1995, J. Virol. 69 2932-2945). By using a 
Rev and Rev -responsive element of HIV, such inhibition 
could be overcome (Tan et al . 1995, supra) . 
Accordingly, the inventors examined whether removal of 

15 putative "inhibitory sequences" in the LI ORF would 
allow production of LI protein in undifferentiated 
cells. Deletion mutagenesis of BPV LI to remove 
putative inhibitory sequences and expression of 
resultant deletion mutants in CV-1 cells revealed 

20 surprisingly that despite expression of LI mRNA, LI 
protein could not be detected. 

In view of the foregoing, it has been 
difficult hitherto to understand how papillomaviruses 
produce large amounts of LI protein in the late stage 

25 of their life cycle using this apparently 
"untranslatable" gene . 

Surprisingly, however, it has now been 
discovered that PV LI protein can be produced at 
substantially enhanced levels in an undifferentiated 

3 0 host cell by replacing existing codons of a native LI 
gene with synonymous codons used at relatively high 
frequency by genes of the undifferentiated host cell 
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compared to the existing codons . It has also been 
found unexpectedly that there are substantial 
differences in the relative abundance of particular 
isoaccepting transfer RNAs (tRNAs) in different cells 
5 or tissues and this plays a pivotal role in protein 
expression from a gene with a given codon usage or 
composition. This discovery has been reduced to 
practice in synthetic nucleic acid sequences and 
generic methods, which utilize codon alteration as a 
10 means for targeting expression of a protein to 
particular cells or tissues or alternatively, to cells 
in a specific state of differentiation. 

OBJECT QF TH E INVENTION 

15 

It is therefore an object of the present 
invention to provide a synthetic nucleic acid sequence 
and a method for selectively expressing a protein in a 
target cell or tissue which sequence and method 
2 0 ameliorate at least some of the disadvantages 
associated with the prior art. 

SUMMARY OF THE INVENTION 

2 5 Accordingly, in one aspect of the 

invention, there is provided a synthetic nucleic acid 
sequence capable of selectively expressing a protein 
in a target cell or tissue of a mammal, wherein said 
selective expression is effected by replacing at least 

3 0 one existing codon of a parent nucleic acid sequence 

with a synonymous codon to form said synthetic nucleic 
acid sequence. 
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Suitably, said synonymous codon corresponds 
to an iso-tRNA which, when compared to an iso-tRNA 
corresponding to the at least one existing codon, is 
in higher abundance in the target cell or tissue 
5 relative to one or more other cells or tissues of the 
mammal . 

Preferably, said synonymous codon 

corresponds to an iso-tRNA which, when compared to an 
iso-tRNA corresponding to the at least one existing 

10 codon, is in higher abundance in the target cell or 
tissue relative to a precursor cell or tissue. 

Alternatively, said synonymous codon 
corresponds to an iso-tRNA which, when compared to an 
iso-tRNA corresponding to the at least one existing 

15 codon, is in higher abundance in the target cell or 
tissue relative to a cell or tissue derived therefrom. 

Advantageously, said corresponding iso-tRNA 
in said target cell or tissue is at a level which is 
at least 110%, preferably at least 200%, more 

20 preferably at least 500%, and most preferably at least 
1000%, of that expressed in the or each other cell or 
tissue of the mammal. 

Alternatively, the synonymous codon may be 
selected from the group consisting of (1) a codon used 

25 at relatively high frequency by genes, preferably 
highly expressed genes, of the target cell or tissue, 
(2) a codon used at relatively high frequency by 
genes, preferably highly expressed genes, of the or 
each other cell or tissue, (3) a codon used at 

3 0 relatively high frequency by genes, preferably highly 
expressed genes, of the mammal, (4) a codon used at 
relatively low frequency by genes of the target cell 
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or tissue, (5) a codon used at relatively low 
frequency by genes of the or each other cell or 
tissue, (6) a codon used at relatively low frequency 
by genes of the mammal, (7) a codon used at relatively 
5 high frequency by genes of another organism, and (8) a 
codon used at relatively low frequency by genes of 
another organism. 

In a preferred embodiment, the at least one 
existing codon and the synonymous codon are preferably 

10 selected such that said protein is expressed from said 
synthetic nucleic acid sequence in said target cell or 
tissue at a level which is at least 110%, preferably 
at least 200%, more preferably at least 500%, and most 
preferably at least 1000%, of that expressed from said 

15 parent nucleic acid sequence in said target cell or 
tissue . 

In another aspect, the invention resides in 
a method for selectively expressing a protein in a 
target cell or tissue of a mammal, wherein said 

20 selective expression is effected by replacing at least 
one existing codon of a parent nucleic acid sequence 
with a synonymous codon to form said synthetic nucleic 
acid sequence. 

Preferably, the method is further 

25 characterized by the steps of: 

(a) replacing at least one existing codon 
of a parent nucleic acid sequence encoding said 
protein with a synonymous codon to produce a synthetic 
nucleic acid sequence having altered translat ional 

30 kinetics compared to said parent nucleic acid sequence 
such that said protein is selectively expressible in 
said target cell or tissue; 
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(b) administering to the mammal and 
introducing into said target cell or tissue, or a 
precursor cell or precursor tissue thereof, said 
synthetic nucleic acid sequence operably linked to one 

5 or more regulatory nucleotide sequences; and 

(c) selectively expressing said protein 
in said target cell or tissue. 

Preferably, the method further includes, 
prior to step (a) : 

10 -(i) measuring relative abundance of 

different isoacceptor transfer RNAs in said target 
cell or tissue, and in one or more other cells or 
tissues of the mammal; and 

(ii) identifying said at least one 

15 existing codon and said synonymous codon based on said 
measurement, wherein said synonymous codon corresponds 
to an iso-tRNA which, when compared to an iso-tRNA 
corresponding to the existing codon, is in higher 
abundance in said target cell or tissue relative to 

20 the or each other cell or tissue of the mammal. 

Suitably, step (ii) above is further 
characterized in that said synonymous codon 
corresponds to an iso-tRNA which, when compared to an 
iso-tRNA corresponding to the at least one existing 

25 codon, is in higher abundance in the target cell or 
tissue relative to a precursor cell or tissue. 

Alternatively, step (ii) above is further 
characterized in that said synonymous codon 
corresponds to an iso-tRNA which, when compared to an 

30 iso-tRNA corresponding to the at least one existing 
codon, is in higher abundance in the target cell or 
tissue relative to a cell or tissue derived therefrom. 
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Alternatively, the method further includes, 
prior to step (a) , identifying said at least one 
existing codon and said synonymous codon based on 
respective relative frequencies of particular codons 
5 used by genes selected from the group consisting of 
(I) genes of the target cell or tissue, (II) genes of 
the or each other cell or tissue, (III) genes of the 
mammal, and (IV) genes of another organism. 

In yet another aspect, the invention 
10 provides a method for expressing a protein in a target 
cell or tissue from a first nucleic acid sequence 
including the steps of : 

introducing into said target cell or 
tissue, or a precursor cell or precursor tissue 
15 thereof, a second nucleic acid sequence encoding at 
least one isoaccepting transfer RNA wherein said 
second nucleic acid sequence is operably linked to one 
or more regulatory nucleotide sequences, and wherein 
said at least . one isoaccepting transfer RNA is 
20 normally in relatively low abundance in said target 
cell or tissue and corresponds to a codon of said 
first nucleic acid sequence. 

In a further aspect, the invention extends 
to a method for producing a virus particle in a 
25 cycling eukaryotic cell, said virus particle 
comprising at least one protein necessary for assembly 
of said virus particle, wherein said at least one 
protein is not expressed in said cell from a parent 
nucleic acid sequence at a level sufficient to permit 
3 0 virus assembly therein, said method including the 
steps of : 
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(a) replacing at least one existing codon 
of said parent nucleic acid sequence with a synonymous 
codon to produce a synthetic nucleic acid sequence 
having altered translational kinetics compared to said 

5 parent nucleic acid sequence such that said at least 
one protein is expressible from said synthetic nucleic 
acid sequence in said cell at a level sufficient to 
permit virus assembly therein; 

(b) introducing into said cell or a 
10 precursor thereof said synthetic nucleic acid sequence 

operably linked to one or more regulatory nucleotide 
sequences ; and 

(c) expressing said at least one protein 
in said cell in the presence of other viral proteins 

15 required for assembly of said virus particle to 
thereby produce said virus particle . 

In yet a further aspect of the invention, 
there is provided a method for producing a virus 
particle in a cycling cell, said virus particle 

20 comprising at least one protein necessary for assembly 
of said virus particle, wherein said at least one 
protein is not expressed in said cell from a parent 
nucleic acid sequence at a level sufficient to permit 
virus assembly therein, and wherein at least one 

25 existing codon of said parent nucleic acid sequence is 
rate limiting for the production said at least one 
protein to said level, said method including the step 
of introducing into said cell a nucleic acid sequence 
capable of expressing therein an isoaccepting transfer 

3 0 RNA specific for said at least one codon. 
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BRIEF DESCRITPION OF THE DRAWINGS 



Figure 1A depicts the nucleotide sequence 
(SEQ ID NO:l) and deduced amino acid sequence (SEQ ID 
5 NO: 2) of BPV1 LI. Amino acids (in single letter code) 
are presented below the second nucleotide of each 
codon. Mutations introduced into the genes are 

indicated above the corresponding nucleotides of the 
original sequence. Horizontal lines indicate the 

10 sites and enzymes used for cloning. This replacement 
of nucleotides resulted in a nucleic acid sequence 
encoding BPV-1 LI polypeptide with an amino acid 
sequences identical to the wild type, but having 
synonymous codons that are frequently used by 

15 mammalian genes. 

Figure IB shows the nucleotide sequence 
(SEQ ID NO: 5) and deduced amino acid sequence (SEQ ID 
NO: 6) relating to BPV1 L2 ORF . Amino acids (in single 
letter code) are presented below the second nucleotide 

2 0 of each codon. Mutations introduced into the genes 
are indicated above the corresponding nucleotides of 
the original sequence. Horizontal lines indicate the 
sites and enzymes used for cloning. This replacement 
of nucleotides resulted in a nucleic acid sequence 

25 encoding BPV-1 L2 polypeptide with an amino acid 
sequences identical to the wild type, but having 
synonymous codons that are frequently used by 
mammalian genes. 

Figure 1C depicts the nucleotide sequence 

30 (SEQ ID NO: 9) and deduced amino acid sequence (SEQ ID 

NO:10) of green fluorescent protein (GFP) . Amino 
acids (in single letter code) are presented below the 
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second nucleotide of each codon. Mutations introduced 
into the genes are indicated above the corresponding 
nucleotides of the original sequence. Horizontal 
lines indicate the sites and enzymes used for cloning. 
5 This replacement of nucleotides resulted in a nucleic 
acid sequence encoding GFP polypeptide with an amino 
acid sequence identical to the native sequence 
modified for optimal expression in eukaryotic cells, 
but having synonymous codons that are frequently used 

10 by papillomavirus genes. 

Figure 2A shows detection of LI protein 
expressed from synthetic and wild type BPV1 LI genes. 
Cos-1 cells were transfected with a synthetic LI 
expression plasmid pCDNA/HBLl, and a wild type LI 

15 expression plasmid pCDNA/BPVLlwt . The expression of 
LI was detected by immunof luorescent staining. Cells 
were fixed after 3 6 hrs and incubated with rabbit 
anti-BPVl LI antiserum, followed by F I TC- conjugated 
goat ant i- rabbit IgG antibody. 

2 0 Figure 2B shows detection by Western blot 

of LI protein from Cos-1 cells transfected with 
pCDNA/HBLl and pCDNA/BPVLlwt . 

Figure 2C shows a Northern blot in which LI 
mRNA extracted from transfected cells was probed with 

25 32 P-labeled probes produced from wild type LI sequence. 

The amount of mRNA loaded in respective lanes was 
examined by hybridization of the mRNA sample with a 
gapdh probe ; 

Figure 3A shows detection of L2 protein 
30 expressed from synthetic and wild type BPV1 L2 genes. 

Cos-1 cells were transfected with a synthetic L2 
expression plasmid pCDNA/HBL2 , and a wild type L2 

SUBSTITUTE SHEET (RULE 26) 
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expression plasmid pCDNA/BPVL2wt . The expression of 
L2 was detected by immunof luorescent staining. Cells 
were fixed after 36 hrs and incubated with rabbit 
anti-BPVl L2 antiserum, followed by FITC-conjugated 
5 goat anti-rabbit IgG antibody. 

Figure 3B shows detection by Western blot 
of L2 protein from Cos-1 cells transfected with 
p CDNA / HBL2 and pCDNA/BPVL2wt . 

Figure 3C shows a Northern blot in which L2 
10 mRNA extracted from transfected cells was probed with 
32 P- labeled probes produced from wild type L»2 sequence. 
The amount of mRNA loaded in respective lanes was 
examined by hybridization of the mRNA sample with a 
gapdh probe . 

15 Figure 4 shows in vitro translation of 

BPVL1 sequences, wild type BPVL1 (wt) or synthetic LI 
(HB) using rabbit reticulocyte lysate or wheat germ 
extract in the presence of 35 S-methionine. In the top 
panel, wt LI or HB LI plasmid DNA was added to the T7 

20 DNA polymerase -coupled in vitro translation system. 

LI protein was detected by Western blot analysis. In 
the bottom panel, the translation efficiency of wt LI 
or HB LI sequences in the presence or absence of tRNA 
was compared. Translation was carried out in rabbit 

2 5 reticulocyte lysate (rabbit) or wheat germ extract 

(wheat) , and samples were collected every two minutes 
starting from minute 8. Left side of lower panel 
indicates if 10" 5 M bovine liver or yeast tRNA was 
supplied. 

3 0 Figure 5A is a schematic representation of 

plasmids used to determine L2 expression from BPV 
cryptic promoter (s). The wild type LI sequence and 

SUBSTITUTE SHEET (RULE 26) 
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most of the wild type L2 sequence were deleted from 
the BPV1 genome by BamHI and Hindi 1 1 digestion and the 
remaining BPV1 sequence (in yellow) was cloned into 
pUC18. Wild type or synthetic humanized L2 sequences 
5 (in red) were inserted into the BamHI site of the BPV1 
genome. The position of the inserted SV40 ori 

sequence (in white) is indicated. The plasmid in 
which modified L2 was used but without SV40 ori 
sequence was also used as a control . The plasmids 
10 were transfected into Cos-1 cells and the expression 
of L2 protein was determined using BPV1 L2 -specific 
polyclonal antiserum followed by FITC- linked anti 
rabbit IgG. 

Figure 5B shows expression of L2 protein 
15 from native papillomavirus promoter. The plasmids 
shown in Figure 5A were used to transfect Cos-1 cells 
and the expression of L2 protein was determined using 
BPV1 L2 -specific polyclonal antiserum followed by 
FITC-linked anti rabbit IgG. A mock transfection in 
20 which the cells did not receive plasmid was used as 
control . 

Figure 6 shows expression of GFP in Cos-1 
cells transfected with wild-type gfp (wt) or a 
synthetic gfp gene carrying codons used at relatively 

25 high frequency by papillomavirus genes (p) . The mRNA 
extracted from cells transfected with gfp or P gfp was 
probed with 32 P-labeled gfp probe and is shown on the 
right panel, using gapdh as a reference gene. 

Figure 7 shows the expression pattern of 

3 0 GFP in vivo from wild- type gfp gene, or a synthetic 
gfp gene carrying codons used at relatively high 
frequency by papillomavirus genes. Using a gene gun # 

SUBSTITUTE SHEET (RULE 26) 
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mice were shot with PGFP (left panel) and GFP (right 
panel) expression plasmids encoding GFP protein. A 
transverse section of the mouse skin section shows 
where the gfp gene is expressed. Bright -field 
5 photographs of the same section where dermis (D) 
epidermis (E) are highlighted are shown to identify 
the location of fluorescence in the epidermis. Arrows 
indicate fluorescent signals. 

10 DETAILED DESCRIPTION 

The present invention arises from the 
unexpected discovery that the relative abundance of 
different isoaccepting transfer RNAs varies in 

15 different cells or tissues, or alternatively in cells 
or tissues in different states of differentiation or 
in different stages of the cell cycle, and that such 
differences may be exploited together with codon 
composition of a gene to regulate and direct 

20 expression of a protein to a particular cell or 
tissue, or alternatively to a cell or tissue in a 
specific state of differentiation or in a specific 
stage of the cell cycle. According to the present 
invention, this selective targeting is effected by 

25 replacing at least one existing codon of a parent 
nucleic acid sequence encoding the protein with a 
synonymous codon . 

Replacement of synonymous codons for 
existing codons is not new per se. In this regard, we 

3 0 refer to International Application Publication No WO 
96/09378 which utilizes such substitution to provide a 
method of expressing proteins of eukaryotic and viral 
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origin at high levels in in vitro mammalian cell 
culture systems, the main thrust of the method being 
the harvesting of such proteins. In distinct 

contrast, the present invention utilizes substitution 
5 of one or more codons in a gene for targeting 
expression of the gene to particular cells or tissues 
with the ultimate aim of facilitating gene therapy as 
described herein. 

The term "synonymous codon" as used herein 

10 refers to a _ codon having a different nucleotide 
sequence to an existing codon but encoding the same 
amino acid as the existing codon. 

By "isoaccepting transfer RNA" is meant one 
or more transfer RNA molecules that differ in their 

15 anticodon structure but are specific for the same 
amino acid. 

Throughout this specification, unless the 
context requires otherwise, the words "comprise", 
comprises" and "comprising" will be understood to 
2 0 imply the inclusion of a stated integer or group of 
integers but not the exclusion of any other integer or 
group of integers . 

Selection of synonymous codons 

2 5 Determination of relative abundance of 

different tRNA species in different cells 

Advantageously, the synonymous codon 
corresponds to an iso-tRNA (i so -tRNA) which, when 
compared to an iso-tRNA corresponding to the at least 

3 0 one existing codon, is in higher abundance in the 
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target cell or tissue relative to one or more other 
cells or tissues of the mammal. 

Any method for determining the relative 
abundance of an iso-tRNA in two or more cells or 
5 tissues may be employed. For example, such method may 
include isolating two or more particular cells or 
tissues from a mammal, preparing an RNA extract from 
each cell or tissue which extract includes tRNA, and 
probing each extract respectively with different 
10 nucleic acid sequences each being specific for a 
particular iso-tRNA to determine the relative 
abundance of an iso-tRNA between the two or more cells 
or tissues. 

Suitable methods for isolating particular 

15 cells or tissues are well known to those of skill in 
the art. For example, one can take advantage of one 
or more particular characteristics of a cell or tissue 
to specifically isolate the cell or tissue from a 
heterogeneous population. Such characteristics 

20 include, but are not limited to, anatomical location 
of a tissue, cell density, cell size, cell morphology, 
cellular metabolic activity, cell uptake of ions such 
as Ca 2+ , K* , and H + ions, cell uptake of compounds such 
as stains, markers expressed on the cell surface, 

25 cytokine expression, protein fluorescence, and 
membrane potential. Suitable methods that may be used 
in this regard include surgical removal of tissue, 
flow cytometry techniques such as fluorescence- 
activated cell sorting (FACS) , immunoaf f inity 

30 separation (e.gr., magnetic bead separation such as 
Dynabead™ separation), density separation (e.gr., 
metrizamide, Percoll™, or Ficoll™ gradient 
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centrifugation) , and cell -type specific density 
separation (e.g., Lymphoprep™) . For example, dividing 
cells or blast cells may be separated from non- 
dividing cells or resting cells according to cell size 
5 by FACS or metrizamide gradient separation. 

Any suitable method for isolating total RNA 
from a cell or tissue may be used. Typical procedures 
contemplated by the invention are described in CURRENT 
PROTOCOLS IN MOLECULAR BIOLOGY (Ausubel, et al . , eds) 

10 (John Wiley & Sons, Inc. 1997), hereby incorporated by 

reference, at page 4.2.1 through page 4.2.7. 
Preferably, techniques which favor isolation of tRNA 
are employed as, for example, described in 
Brunngraber, E.F. (1962, Biochem. Biophys . Res. 

15 Commun. 8:1-3) which is hereby incorporated by 
reference . 

The probing of an RNA extract is suitably 
effected with different oligonucleotide sequences each 
being specific for a particular i so -tRNA . Of course 

20 it will be appreciated that for a given mammal, 
oligonucleotide sequences would need to be selected 
which hybridize specifically with particular iso-tRNA 
sequences expressed by the mammal. Such selection is 
well within the realm of one of ordinary skill in the 

25 art based a known iso-tRNA sequence. For example, in 
the case of a mouse, exemplary oligonucleotide 
sequences which may be used include those described in 
Gauss and Sprinzel (1983 , Nucleic Acids Res. 11 (1) ) 
hereby incorporated by reference. In this respect, 

30 the oligonucleotide sequences may be selected from the 
group consisting of: 
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5' 


- TAAGGACTGTAAGACTT 


-3' 


(SEQ 


ID 


NO 


: 13) 


for 


Ala 0 " 


5' 


- CGAGCCAGCCAGGAGTC 


-3' 


(SEQ 


ID 


NO 


:14) 


for 


Arg 00 * 


5' 


- CTAGATTGGCAGGAATT 


-3' 


(SEQ 


ID 


NO 


:15) 


for 


Asn MC 


5' 


- TAAG ATATATAGATTAT 


-3' 


(SEQ 


ID 


NO 


:16) 


for 


Asp wc 


5' 


- AAGTCTTAGTAGAGATT • 


-3' 


(SEQ 


ID 


NO 


:17) 


for 


Cys TGC 


5' 


- TATTTCTACACAGCATT • 


-3' 


(SEQ 


ID 


NO 


: 18) 


for 


Glu 0 ™ 


5' 


- CTAGGACAATAGGAATT - 


-3' 


(SEQ 


ID 


NO 


:19) 


for 


Gin 0 * 


5' 


- TACTCTCTTCTGGGTTT - 


-3' 


(SEQ 


ID 


NO 


:20) 


for 


Gly TCA 


5' 


- TGCCGTGACTCGGATTC - 


-3' 


(SEQ 


ID 


NO 


:21) 


for 


His" 0 


5' 


- TAGAAATAAGAGGGCTT - 


-3' 


(SEQ 


ID 


NO 


:22) 


for 


Ile ATC 


5' 


- TACTTTTATTTGGATTT - 


-3' 


(SEQ 


ID 


NO' 


:23) 


for 


Leu™ 


5' 


- TATTAGGGAGAGGATTT - 


-3' 


(SEQ 


ID 


NO: 


24) 


for 


Leu 0 " 


5' 


- TCACTATGGAGATTTTA- 


-3' 


(SEQ 


ID 


NO: 


25) 


for 


Lys^ 


5' 


- CGCCCAACGTGGGGCTC - 


3' 


(SEQ 


ID 


NO: 


26) 


for 


Lys** 0 


5' 


- TAGTACGGGAAGGATTT- 


•3' 


(SEQ 


ID 


NO: 


27) 


for Met elon9 


5' 


- TGTTTATGGGATACAAT - 


■3' 


(SEQ 


ID 


NO: 


28) 


for 


Phe™ 


5' 


- TCAAGAAGAAGGAGCTA- 


3' 


(SEQ 


ID 


NO: 


29) 


for 


Pro CCA 


5' 


- GGGCTCGTCCGGGATTT - 


3' 


(SEQ 


ID 


NO: 


30) 


for 


Pro 001 


5' 


- ATAAGAAAGGAAGATCG - 


3' 


(SEQ 


ID 


NO: 


31) 


for 


Ser* 30 


5' 


- TGTCTTGAGAAGAGAAG - 


3' 


(SEQ 


ID 


NO: 


32) 


for 


Thr ACA 


5' 


- TGGTAAAAAGAGGATTT - 


3 ' 


(SEQ 


ID 


NO: 


33) 


for 


Tyr TAC 


5' 


- TCAGAGTGTTCATTGGT - 


3' 


(SEQ 


ID 


NO: 


34) 


for 


Val GTA 



25 Typically, the relative abundance of iso- 

tRNA species may be determined by blotting techniques 
that include a step whereby sample RNA or tRNA extract 
is immobilized on a matrix (preferably a synthetic 
membrane such as nitrocellulose) , a hybridization 

3 0 step, and a detection step. Northern blotting may be 
used to identify an RNA sequence that is complementary 
to a nucleic acid probe. Alternatively, dot blotting 
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and slot blotting can be used to identify 
complementary DNA/RNA or RNA/RNA nucleic acid 
sequences. Such techniques are well known by those 
skilled in the art, and have been described in 
5 Ausubel, et al (supra) at pages 2.9.1 through 2.9.20. 

According to such methods, a sample of tRNA 
immobilized on a matrix is hybridized under stringent 
conditions to a complementary nucleotide sequence 
(such as those mentioned above) which is labeled, for 
10 example, radioactively , enzymatically or 

f luorochromat ical ly . 

"Stringency" as used herein, refers to the 
temperature and ionic strength conditions, and 
presence or absence of certain organic solvents, 
15 during hybridization. The higher the stringency, the 
higher will be the degree of complementarity between 
the immobilized nucleotide sequences (i.e., i so -tRNA) 
and the labeled oligonucleotide sequence. For a 
discussion of typical stringent conditions that may be 
2 0 used, see CURRENT PROTOCOLS IN MOLECULAR BIOLOGY supra 
at pages 2.10.1 to 2.10.16, and Sambrook et al in 
MOLECULAR CLONING. A LABORATORY MANUAL (Cold Spring 
Harbor Press, 1989), hereby incorporated by reference, 
at sections 1.101 to 1.104. 
2 5 While stringent washes are typically 

carried out at temperatures from about 42°C to 68°C,. 
one skilled in the art will appreciate that other 
temperatures may be suitable for stringent conditions. 
Maximum hybridization typically occurs at about 2 0° to 
30 25° below the T m for formation of a DNA-DNA hybrid. It 
is well known in the art that the T m is the melting 
temperature, or temperature at which two complementary 
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nucleic acid sequences dissociate. Methods for 

estimating T m are well known in the art (see CURRENT 
PROTOCOLS IN MOLECULAR BIOLOGY supra at page 2.10.8). 
Maximum hybridization typically occurs at about 10° to 
5 15° below the. T m for a DNA-RNA hybrid. 

Other stringent conditions are well known 
in the art. A skilled addressee will recognize that 
various factors can be manipulated to optimize the 
specificity of the hybridization. Optimization of the 
10 stringency of the final washes can serve to ensure a 
high degree of hybridization. 

Methods for detecting labeled nucleotide 
sequences hybridized to an immobilized nucleotide 
sequence are well known to practitioners in the art. 
15 Such methods include autoradiography, 

chemiluminescent , fluorescent and colorimetric 

detection. 

Advantageously, the relative abundance of 
an iso-tRNA in two or more cells or tissues may be 

20 determined by comparing the respective levels of 
binding of a labeled nucleotide sequence specific for 
the iso-tRNA to equivalent amounts of immobilized RNA 
obtained from the two or more cells or tissues. 
Similar comparisons are suitably carried out to 

25 determine the respective relative abundance of other 
iso-tRNAs in the two or more cells or tissues. One of 
ordinary skill in the art will thereby be able .to 
determine a relative tRNA abundance table (see for 
example TABLE 2) for different cells or tissues. From 

3 0 such comparisons, one or more synonymous codons may be 
selected such that the or each synonymous codon 
corresponds to an iso-tRNA which, when compared to an 
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iso-tRNA corresponding to an existing codon of the 
parent nucleic acid sequence, is in higher abundance 
in the target cell or tissue relative to other cells 
or tissues of the mammal. 
5 Advantageously, a synonymous codon is 

selected such that its corresponding iso-tRNA in the 
target cell or tissue is at a level which is at least 
110%, preferably at least 200%, more preferably at 
least 500%, and most preferably at least 1000%, of 
10 that expressed in the or each other cell or tissue of 
the mammal . 

Suitably, synonymous codons for selective 
expression of a protein in a differentiated cell, 
preferably a differentiated keratinocyte , are selected 
15 from the group consisting of gca (Ala) , cuu (Leu) and 
cua (Leu) . 

Synonymous codons for selective expression 
of a protein in an undifferentiated cell, preferably 
an undifferentiated keratinocyte, are suitably 
20 selected from the group consisting of cga (Arg) , cci 
(Pro) and aag (Asn) . 

Analysis of codon usagre 

Alternatively, synonymous codons may be 
25 selected by analyzing the frequency at which codons 
are used by genes expressed in (i) particular cells or 
tissues, (ii) substantially all cells or tissues of 
the mammal, or (iii) an organism which may infect 
particular cells or tissues of the mammal. 
30 Codon frequency tables as well as suitable 

methods for determining frequency of codon usage in an 
organism are described, for example, in an article by 
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Sharp et al (1988, Nucleic Acids Res. 16 8207-8211) 
which is hereby incorporated by reference. 

The relative level of gene expression 
(e.g., detectable protein expression vs no detectable 
5 protein expression) can provide an indirect measure of 
the relative abundance of specific iso-tRNAs expressed 
in different cells or tissues. For example, a virus 
may be capable of propagating within a first cell or 
tissue (which may include a cell or tissue at a 

10 specific stage of differentiation) but may be 
substantially incapable of propagating in a second 
cell or tissue (which may include a cell or tissue at 
another stage of differentiation) . Comparison of the 
pattern of codon usage by genes of the virus with the 

15 pattern of codon usage by genes expressed in the 
second cell or tissue may thus provide indirectly a 
set of synonymous codons which correspond to iso-tRNAs 
expressed at relatively high abundance in the first 
cell or tissue relative to the second cell or tissue 

20 and vice versa. Simultaneously, the above comparison 
may also provide indirectly a set of synonymous codons 
which correspond to iso-tRNAs expressed at relatively 
high abundance in the second cell or tissue relative 
to the first cell or tissue. 

2 5 From the foregoing , a synonymous codon 

according to the invention may correspond to a codon 
including, but not limited to, (1) a codon used at 
relatively high frequency by genes, preferably highly 
expressed genes, of the target cell or tissue, (2) a 

3 0 codon used at relatively high frequency by genes, 

preferably highly expressed genes, of the or each 
other cell or tissue, (3) a codon used at relatively 
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high frequency by genes, preferably highly expressed 
genes, of the mammal, (4) a codon used at relatively 
low frequency by genes of the target cell or tissue, 
(5) a codon used at relatively low frequency by genes 
5 of the or each other cell or tissue, (6) a codon used 
at relatively low frequency by genes of the mammal, 
(7) a codon used at relatively high frequency by genes 
of another organism, and (8) a codon used at 
relatively low frequency by genes of another organism. 

10 For example, codons used at a relatively 

high frequency by genes, preferably highly expressed 
genes, of the mammal may be selected from the group 
consisting of: cue (Leu), cuu, (Leu), cug (Leu), uua 
(Leu) , uug (Leu) ; egg (Arg) , cgc (Arg) , aga (Arg) , agg 

15 (Arg) ; agu (Ser) , age (Ser) , ucu (Ser) , ucc (Ser) , and 

uca (Ser) . Alternatively, such codons may include auu 
(lie), auc (lie); guu (Val) , guc (Val) , gug (Val); acu 
(Thr) , acc (Thr) , aca (Thr) ; gcu (Ala) , gec (Ala) , gca 
(Ala) ; cag (Glu) ; ggc (Gly) , gga (Gly) , ggg (Gly) . 

2 0 Codons used at a relatively low frequency 

by genes of the mammal are described, for example, in 
Sharp et al (1988, supra). Such codons may comprise 
cua (Leu) ,- cga (Arg) , cgu (Arg) ; ucg (Ser) . 
Alternatively, such codons may include aua (lie) ; gua 

25 (Val); acg (Thr); gcg (Ala); caa (Glu); ggu (Gly). 

ConstructiQn of {gyptfretic nyclejc frcic^ sequences 

The step of replacing synonymous codons for 
existing codons may be effected by any suitable 
technique. For example, in vitro mutagenesis methods 
30 may be employed which are well known to those of skill 
in the art. Suitable mutagenesis methods are 
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described for example in the relevant sections of 
Ausubel, et al . (supra) and of Sambrook, et al . , 
(supra) which are hereby incorporated by reference. 
Alternatively, suitable methods for altering DNA are 
5 set forth, for example, in U.S. Patent Nos 4,184,917, 
4,321,365 and 4,351,901, which are hereby incorporated 
by reference. Instead of in vitro mutagenesis, the 
second nucleic acid sequence may be synthesized de 
novo using readily available machinery. Sequential 

10 synthesis of DNA is described, for example, in U.S. 

Patent No 4,293,652, which is hereby incorporated by 
reference. However, it should be noted that the 
present invention is not dependent on and not directed 
to any one particular technique for replacing 

15 synonymous codons for existing codons . 

It is not necessary to replace all the 
existing codons of the parent nucleic acid sequence 
with synonymous codons each corresponding to a iso- 
tRNA expressed in relatively high abundance in the 

20 target cell compared to other cells. Increased 
expression may be accomplished even with partial 
replacement. Preferably, the replacing step affects 
5%, 10%, 15%, 20%, 25%, 30%, more preferably 35%, 40%, 
50%, 60%, 70% or more of the existing codons of the 

25 parent nucleic acid sequence. 

The parent nucleic acid sequence is 
preferably a natural gene. By "natural gene" is meant 
a gene that naturally encodes the protein. However, 
it is possible that the parent nucleic acid sequence 

3 0 encodes a protein that is not naturally-occurring but 
has been engineered using recombinant techniques . 
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The parent nucleic acid sequence need not 
be obtained from the mammal but may be obtained from 
any suitable source such as from a eukaryotic or 
prokaryotic organism. For example, the parent nucleic 
5 acid sequence may be obtained from another mammal or 
other animal. Alternatively, the parent nucleic acid 
sequence may be obtained from a pathogenic organism. 
In such a case, a natural host of the pathogenic 
organism is preferably a mammal. For example, the 
10 pathogenic organism may be a yeast, bacterium or 
virus . 

For example, suitable proteins which may be 
used for selective expression in accordance with the 
invention include, but are not limited to the cystic 

15 fibrosis transmembrane conductance regulator (CFTR) 
protein, and adenosine deaminase (ADA) . In the case 
of CFTR, a parent nucleic acid sequence encoding the 
CFTR protein which may be utilized to produce the 
synthetic nucleic acid sequence is described, for 

20 example, in Riordan et al (1989, Science 245 1066- 
1073) , and in the GenBank database under Accession No. 
HUMCFTRM, which are hereby incorporated by reference. 

The term "nucleic acid sequence" as used 
herein designates mRNA, RNA, cRNA, cDNA or DNA. 

25 Regulatory nucleotide sequences which may 

be utilized to regulate expression of the synthetic 
nucleic acid sequence include, but are not limited to, 
a promoter, an enhancer, and a transcriptional 
terminator. Such regulatory sequences are well known 

30 to those of skill in the art. 

Synthetic nucleic acid sequences according 
to the invention may be operably linked to one or more 
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regulatory sequences in the form of an expression 
vector. By Vector" is meant a nucleic acid molecule, 
preferably a DNA molecule derived, for example, from a 
plasmid, bacteriophage, or mammalian or insect virus, 
5 into which a synthetic nucleic acid sequence may be 
inserted or cloned. A vector preferably contains one 
or more unique restriction sites and may be capable of 
autonomous replication in a defined host cell 
including the target cell or tissue or a precursor 

10 cell or precursor tissue thereof, or be integratable 
with the genome of the defined host such that the 
cloned sequence is reproducible. Thus, by "expression 
vector" is meant any autonomous element capable of 
directing the synthesis of a protein. Such expression 

15 vectors are well known by practitioners in the art. 

The term "precursor cell" as used herein 
refers to a cell that gives rise to the target cell. 

The invention also contemplates synthetic 
nucleic acid sub-sequences encoding desired portions 

20 of the protein. A nucleic acid sub-sequence encodes a 
domain of the protein having a function associated 
therewith and preferably encodes at least 10, 20, 50, 
100, 150, or 500 contiguous amino acids of the 
protein. 

25 The step of introducing the synthetic 

nucleic acid sequence into a target cell will differ 
depending on the intended use and or species, and may 
involve non-viral and viral vectors, cationic 
liposomes, retroviruses and adenoviruses such as, for 

30 example, described in Mulligan, R.C., (1993 Science 
260 926-932) which is hereby incorporated by 
reference. Such methods may include: 
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(i) Local application of the synthetic 
nucleic acid sequence by injection (Wolff et al . , 
1990 , Science 247 1465-1468, which is hereby 
incorporated by reference) , surgical implantation, 
5 instillation or any other means. This method may also 
be used in combination with local application by 
injection, surgical implantation, instillation or any 
other means, of cells responsive to the protein 
encoded by the synthetic nucleic acid sequence so as 

10 to increase the effectiveness of that treatment. This 
method may also be used in combination with local 
application by injection, surgical implantation, 
instillation or any other means, of another factor or 
factors required for the activity of said protein. 

15 (ii) General systemic delivery by 

injection of DNA, (Calabretta et al . , 1993, Cancer 
Treat. Rev. 19 169-179, which is hereby incorporated 
by reference) , or RNA, alone or in combination with 
liposomes (Zhu et al . , 1993, Science 261 209-212, 

20 which is hereby incorporated by reference) , viral 
capsids or nanoparticles (Bertling et al . , 1991, 
Biotech. Appl . Biochem. 13 390-405, which is hereby 
incorporated by reference) or any other mediator of 
delivery. Improved targeting might be achieved by 

25 linking the synthetic nucleic acid sequence to a 
targeting molecule (the so-called "magic bullet" 
approach employing for example, an antibody) , or by 
local application by injection, surgical implantation 
or any other means, of another factor or factors 

3 0 required for the activity of the protein produced from 
said synthetic nucleic acid sequence, or of cells 
responsive to said protein. 
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(iii) Injection or implantation or delivery 
by any means, of cells that have been modified ex vivo 
by transfection (for example, in the presence of 
calcium phosphate: Chen et al . , 1987, Mole . Cell 
5 Biochem. 7 2745-2752, or of cationic lipids and 
polyamines: Rose et al . , 1991, BioTech. 10 520-525, 
which articles are hereby incorporated by reference), 
infection, injection, electroporation (Shigekawa et 
al., 1988, BioTech. 6 742-751, which is hereby 

10 incorporated by reference) or any other way so as to 
increase the expression of said synthetic nucleic acid 
sequence in those cells. The modification may be 
mediated by plasmid, bacteriophage, cosmid, viral 
(such as adenoviral or retroviral; Mulligan, 1993, 

15 Science 260 926-932; Miller, 1992, Nature 357 455-460; 

Salmons et al., 1993, Hum. Gen. Ther. 4 129-141, which 
articles are hereby incorporated by reference) or 
other vectors, or other agents of modification such as 
liposomes (Zhu et al . , 1993, Science 261 209-212, 

20 which is hereby incorporated by reference) , viral 
capsids or nanoparticles (Bertling et al., 1991, 
Biotech. Appl . Biochem. 13 3 90-4 05, which is hereby 
incorporated by reference) , or any other mediator of 
modification. The use of cells as a delivery vehicle 

25 for genes or gene products has been described by Barr 
et al., 1991, Science 254 1507-1512 and by Dhawan et 
al., 1991, Science 254 1509-1512, which articles are 
hereby incorporated by reference. Treated cells may 
be delivered in combination with any nutrient, growth 

3 0 factor, matrix or other agent that will promote their 
survival in the treated subject. 
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In yet another aspect, the invention 
provides a pharmaceutical composition comprising the 
synthetic nucleic sequences of the invention and a 
pharmaceutically acceptable carrier. 
5 By "pharmaceutically-acceptable carrier" is 

meant a solid or liquid filler, diluent or 
encapsulating substance that may be safely used in 
systemic administration. Depending upon the 

particular route of administration, a variety of 

10 pharmaceutically acceptable carriers, well known in 
the art may be used. These carriers may be selected 
from a group including sugars, starches, cellulose and 
its derivatives, malt, gelatin, talc, calcium sulfate, 
vegetable oils, synthetic oils, polyols, alginic acid, 

15 phosphate buffered solutions, emulsifiers, isotonic 
saline, and pyrogen- free water. 

Any suitable technique may be employed for 
determining expression of the protein from said 
synthetic nucleic acid sequence in a particular cell 

20 or tissue. For example, expression can be measured 
using an antibody specific for the protein of interest 
or portion thereof. Such antibodies and measurement 
techniques are well known to those skilled in the art. 

Applications 

25 In one embodiment of the present invention, 

the target cell is suitably a differentiated cell. 
Advantageously, the protein which is desired to be 
selectively expressed in the differentiated cell is 
not expressible in a precursor cell thereof (such as 

30 an undifferentiated or less differentiated cell of the 
mammal) from a parent nucleic acid sequence at a level 
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sufficient to effect a particular function associated 
with said protein. In this embodiment , the step of 
replacing at least one existing codon with a 
synonymous codon is characterized in that the 
5 synonymous codon corresponds to an iso-tRNA which, 
when compared to the iso-tRNA corresponding to the at 
least one existing codon, is in relatively higher 
abundance in the differentiated cell compared to the 
precursor cell. Accordingly, a synthetic nucleic acid 

10 sequence is produced having altered translational 
kinetics compared to the parent nucleic acid sequence 
wherein the protein is expressible in the 
differentiated cell at a level sufficient to effect a 
particular function associated with said protein, but 

15 wherein the protein is not expressible in the 
precursor cell at a level sufficient to effect said 
function. 

As used herein, the term "function" refers 
to a biological, or therapeutic function. 

20 The above embodiment may be utilized 

advantageously for somatic gene therapy where 
overexpression of a protein in undifferentiated cells 
such as stems cells has undesirable consequences 
including death or differentiation of the stem cells. 

25 In such a case, a suitable protein may include cystic 
fibrosis transmembrane conductance regulator (CFTR); 
protein, and adenosine deaminase (ADA) . 

The differentiated cell may comprise a cell 
of any lineage including a cell of epithelial, 

30 hemopoetic or neural origin. For example, the 

differentiated cell may be a mature differentiated 
keratinocyte . 
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Targeting expression of a protein to 
progeny of a stem cell but not to the stem cell itself 
The synthetic nucleic acid sequence 
5 produced above may be transfected directly into the 
differentiated cell for the desired function or 
alternatively, transfected into the precursor cell. 
For example, in the case of ADA deficiency, expression 
of ADA in stem cells may result in loss of stem 

10 phenotype which is undesirable. However, an 

advantageous therapy may reside in transducing 
autologous marrow stem cells with a synthetic nucleic 
acid sequence operably linked to one or more 
regulatory sequences, wherein existing codons of the 

15 wild type ADA gene have been replaced with synonymous 
codons each corresponding to an iso-tRNA expressed in 
relatively high abundance in differentiated 
lymphocytes compared to the marrow stem cells. The 
transduced stem cells may then be reinfused into the 

20 patient. This approach will result in transduced 
marrow stem cells which are not capable of expressing 
ADA themselves, but which are able to give rise to a 
renewable population of differentiated lymphocytes 
which are capable of expressing ADA at levels 

25 sufficient to permit a therapeutic effect. In this 
regard, a suitable cell source for this purpose may 
comprise stem cells isolated as CD34 positive cells 
from a patient's peripheral blood or marrow. For gene 
delivery, a suitable vector may include a retrovirus 

30 or Adeno associated virus. 

Alternatively, in the case of inducing cell 
mediated immunity, dendritic cells are important 
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antigen presenting cells (APC) but have a very limited 
life span for antigen presentation once activated of 
between 14 to 21 days. Consequently, dendritic cells 
provide relatively short-term immune stimulation that 
5 may not be optimal. However, in accordance with the 
present invention, a long-term immune stimulation may 
be provided by transducing autologous bone marrow- 
derived CD34 positive dendritic cell precursors with a 
synthetic nucleotide sequence encoding an antigen. 
10 such as the melanoma antigen MART-1, wherein the 
synthetic sequence is operably linked to one or more 
regulatory sequences, and wherein existing codons of a 
wild type nucleotide sequence encoding MART-1 have 
been replaced with synonymous codons each 
15 corresponding to an iso-tRNA expressed in relatively 
high abundance in dendritic cells compared to the 
dendritic cell precursors. The transduced dendritic 
cell precursors may then be reinfused into the 
patient. This approach will result in transduced 
2 0 dendritic cell precursors which are not capable of 
expressing MART-1 themselves, but which are able to 
give rise to a renewable population of dendritic cells 
which are capable of expressing MART-1 at levels 
sufficient to permit a lifelong intermittent 
25 restimulation of a cytotoxic T lymphocyte (CTLi) 
response to the MART-1 antigen. 

Targeting expression of a protein to a stem 
cell but not to progeny of the stem cell 
30 In an alternate embodiment, the target cell 

may be an undifferentiated cell wherein the protein is 
not expressible in said undifferentiated cell, from a 
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parent nucleic acid sequence encoding the protein, at 
a level sufficient to effect a particular function 
associated with the protein. In such a case, at least 
one existing codon of the parent nucleic acid sequence 
5 is replaced with a synonymous codon corresponding to 
an iso-tRNA which, when compared to the iso-tRNA 
corresponding to the at least one existing codon, is 
in relatively higher abundance in the undifferentiated 
cell compared to a differentiated cell. This results 

10 in a synthetic nucleic acid sequence having altered 
translational kinetics compared to said parent nucleic 
acid sequence wherein the protein is expressible in 
the undifferentiated cell at a level sufficient to 
effect a particular function associated with the 

15 protein, but wherein the protein is not expressible in 
differentiated cells derived from the undifferentiated 
cell at a level sufficient to effect said function. 

This alternate embodiment may, by way of 
example, be used to permit expression of a 

20 transcriptional regulatory protein which when 
expressed in a particular undifferentiated cell or 
stem cell facilitates differentiation of the stem cell 
along a particular cell lineage. It will be 

appreciated that in such a case, the regulatory 

25 protein is normally expressed from a gene in which the 
existing codons correspond to iso-tRNAs which are in 
relatively low abundance in the stem cell compared to 
other iso-tRNAs and that therefore the protein is not 
capable of being expressed at levels sufficient for 

30 commitment of the stem cell to differentiate along a 
particular cell lineage. It will also be apparent 
that such commitment to differentiate along a 
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particular cell lineage may be utilized to prevent 
production of a particular lineage of cells such as 
cancer cells. 

Alternatively , the method according to this 
5 embodiment may be used to express a transcriptional 
regulatory protein that is involved in the production 
of a therapeutic agent or agents. Such a protein may 
include / for example, NF-kappa-B transcription factor 
p65 subunit (NF-kappa-B p65) which is involved in the 

10 production of interleukin-2 (IL-2) , interleukin-3 (IL.- 
3) and granulocyte and macrophage colony stimulating 
factor (GMCSF) . NF-kappa-B p65 is encoded naturally by 
a nucleotide sequence comprising a number of existing 
codons each corresponding to an iso-tRNA expressed in 

15 relatively low abundance in stem cells. Accordingly, 
such sequence may be used as the parent nucleic acid 
sequence according to this embodiment. A suitable 
nucleotide sequence encoding this protein is 
described, for example, in Lyle et al (1994, Gene 13 8 

20 265-266) and in the EMBL database under Accession No 
HSNFKB65A which are hereby incorporated by reference. 

A suitable undifferentiated cell which may 
be utilized in accordance with the present embodiment 
includes but is not limited to a stem cell, such as a 

25 CD34 positive hemopoetic stem cell. 

The present embodiment may also be used 
advantageously for gene therapy where ongoing 
regulated expression of a transgene is desirable. For 
example, secure but reversible regulation of fertility 

30 is desirable in veterinary practice and in humans. 

Such regulation may be effected by transducing 
autologous breast ductal epithelial cells with a 
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synthetic nucleic acid encoding a leutinising hormone 
(LH) antagonist or a leutinising hormone releasing 
hormone (LHRH) antagonist under the control of one or 
more regulatory sequences. The synthetic nucleic acid 
5 may be produced by replacing existing codons of a 
parent nucleic acid with synonymous codons 
corresponding to iso-tRNAs expressed in relatively 
high abundance in resting breast ductal epithelial 
cells compared to differentiated cells arising 

10 therefrom- Once the transduced cells are implanted 
back into the patient, expression may be switched off 
by oral administration of progestagen, forcing the 
differentiation of the majority of the stem cells and 
loss of expression of the antagonist. Once pregnancy 

15 is established, the suppression would be self 
sustaining by the naturally produced progestagen. The 
iso-tRNA composition of resting and oestrogen drived 
breast epithelial cells may be established by first 
obtaining resting cells from reduction mammoplasty, 

20 and determining the cellular tRNA composition in the 
presence and absence of oestrogen. The synthetic 
nucleic acid sequence may be introduced into 
autologous resting epithelial cells by cell 
electroporation ex vivo, and the transduced cells may 

25 be subsequently transplanted subcutaneously into the 
patient. Progestagen may be administered as required 
to reverse regulation of fertility. 

Targeting expression of a toxin to a tumor 
3 0 cell but not to any other cells of the mammal 

Many toxins and drugs are available that 
can kill tumor cells. However, these toxins and drugs 
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are generally toxic for all dividing cells- This 
problem may be nevertheless ameliorated by 
establishing the isoacceptor tRNA composition in a 
tumor clone, and constructing a synthetic toxin gene 
5 (e.gr. , ricin gene) or a synthetic anti-proliferation 
gene {e.g., the tumor supressor p53) using synonymous 
codons corresponding to iso-tRNAs expressed at 
relatively high abundance in the tumor clone compared 
to normal dividing cells of the mammal. The synthetic 
10 gene is then introduced into the patient by suitable 
means to selectively express the synthetic genes in 
tumor cells. 

Alternatively, a chemotherapy enhancing 
product gene (i.e., a drug resistance gene e.g., the 
15 multi-drug resistance gene) using a codon pattern 
unlikely to be expressed in the tumor efficiently may 
be employed. 

Targeting gene therapy to control body fat 
2 0 Leptins are proteins known to control 

satiety. By analogy with animal data, however, if too 
much leptin is administered to a patient, leptin- 
induced starvation might occur. Advantageously, a 
synthetic gene encoding leptin may be constructed 
25 including synonymous codons corresponding to iso-tRNAs 
expressed at relatively high levels in activated 
adipocytes compared to non-activated adipocytes. The 
synthetic gene may then be introduced into the patient 
by suitable means such that leptin is only expressed 
30 substantially in activated adipocytes as opposed to 
non-activated adipocytes. As body fat turnover 

diminishes under the influence of leptin reduced 
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appetite, the metabolic activity of the adipocytes 
falls and the leptin production decreases 
correspondingly . 

5 Targeting expression of a protein to a 

stage of the cell cycle 

In another embodiment of the invention, the 
target cell may be a non-cycling cell. In this case, 
the protein which is desired to be selectively 
10 expressed in - the non-cycling cell is expressible in a 
cycling cell of the mammal from a parent nucleic acid 
sequence at a level sufficient to effect a particular 
function associated with the protein. The synonymous 
codons are selected such that each corresponds to an 
15 iso-tRNA which, when compared to the iso-tRNA 
corresponding to the at least one existing codon, is 
in higher abundance in the non-cycling cell compared 
to the cycling cell. Accordingly, a synthetic nucleic 
acid sequence is produced having altered translational 
20 kinetics compared to the parent nucleic acid sequence 
wherein the protein is expressible in the non-cycling 
cell at a level sufficient to effect a particular 
function associated with said protein, but wherein the 
protein is not expressible in the non-cycling cell to 
25 effect said function. 

The term w non-cycling cell" as used herein 
refers to a cell that has withdrawn from the cell 
cycle and has entered the GO state. In this state, it 
is well known that transcription of endogenous genes 
3 0 and protein translation are at substantially reduced 
levels compared to phases of the cell cycle, namely 
Gl, S, G2 and M. 
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By "cycling cell" is meant a cell which is 
in one of the above phases of the cell cycle . 

Expressing a protein in a target cell or 
5 tissue by in vivo expression of i so - tRNAs in the 
target cell or tissue 

In another aspect, the invention extends to 
a method wherein a protein may be selectively 
expressed in a target cell by introducing into the 

10 cell an auxiliary nucleic acid sequence capable of 
expressing therein one or more isoaccepting transfer 
RNAs which are not expressed in relatively high 
abundance in the cell, but which are rate limiting for 
expression of the protein from a parent nucleic acid 

15 sequence to a level sufficient for effecting a 
function associated with the protein. In this 

embodiment, introduction of the auxiliary nucleic acid 
sequence in the cell changes the translational 
kinetics of the parent nucleic acid sequence such that 

20 said protein is expressed at a level sufficient to 
effect a function associated with the protein. 

The step of introducing the auxiliary 
nucleic acid sequence into the target cell or a tissue 
comprising a plurality of these cells may be effected 

25 by any suitable means. For example, analogous 

methodologies for introduction of the synthetic 
nucleic acid sequence referred to above may be 
employed for delivery of the auxiliary nucleic acid 
sequence into said cycling cell. 

30 
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Assembly of virus particles in cells which 
do not normally permit assembly of virus particles 

In yet another aspect, the invention 
extends to a method for producing a virus particle in 
5 a cycling eukaryotic cell. The virus particle will 
comprise at least one protein necessary for virus 
assembly, wherein the at least one protein is not 
expressed in the cell from a parent nucleic acid 
sequence at a level sufficient to permit virus 

10 assembly therein. This method is characterized by 
replacing at least one existing codon of the parent 
nucleic acid sequence with a synonymous codon to 
produce a synthetic nucleic acid sequence having 
altered translational kinetics compared to the parent 

15 nucleic acid sequence such that the at least one 
protein is expressible from the synthetic nucleic acid 
sequence in the cell at a level sufficient to permit 
virus assembly therein. The synthetic nucleic acid 
sequence so produced is operably linked to one or more 

20 regulatory nucleotide sequences and is then introduced 
into the cell or a precursor cell thereof. The at 
least one protein is expressed subsequently in the 
cell in the presence of other viral proteins required 
for assembly of the virus particle to thereby produce 

25 the virus particle. 

Advantageously, the synonymous codon 
corresponds to an iso-tRNA expressed at relatively 
high level in the cell compared to the iso-tRNAs 
corresponding to the existing codons . 

30 The cycling cell may be any cell in which 

the virus is capable of replication. Suitably, the 
cycling cell is a eukaryotic cell. Preferably, the 
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cycling cell for production of the virus particle is a 
eukaryotic cell line capable of being grown in vitro 
such as, for example, CV-1 cells, COS cells, yeast or 
spodoptera cells. 
5 Suitably, the at least one protein of the 

virus particle are viral capsid proteins. Preferably, 
the viral capsid proteins comprise LI and/or L2 
proteins of papillomavirus. 

The other viral proteins required for 

10 assembly of the virus particle in the cell may be 
expressed from another nucleic acid sequence (s) which 
suitably contain the rest of the viral genome. In the 
case of the at least one protein comprising LI and/or 
L2 of papillomavirus, said other nucleic acid 

15 sequence (s) preferably comprises the papillomavirus 
genome without the nucleotide sequences encoding LI 
and/or L2 . 

In yet a further aspect of the invention, 
there is provided a method for producing a virus 

20 particle in a cycling cell, said virus particle 
comprising at least one protein necessary for assembly 
of said virus particle, wherein said at least one 
protein is not expressed in said cell from a parent 
nucleic acid sequence at a level sufficient to permit 

25 virus assembly therein, and wherein at least one' 
existing codon of said parent nucleic acid sequence is 
rate limiting for the production said at least one 
protein to said level, said method including the step 
of introducing into said cell a nucleic acid sequence 

3 0 capable of expressing therein an m isoaccepting transfer 
RNA specific for said at least one codon. 
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In yet a further aspect, the invention 
resides in virus particles resulting from the above 
methods . 

The invention further contemplates cells or 
5 tissues containing therein the synthetic nucleic acid 
sequences of the invention, or alternatively, cells or 
tissues produced from the methods of the invention. 

The invention is further described with 
reference to the following non-limiting examples. 

10 

Expression of synthetic LI and L2 protein in 
undifferentiated cells . 

15 Materials and Methods 

Codon replacements in the bovine PV (BPV) 
LI and L2 genes 

The DNA and amino acid sequences of the 
wild- type LI (SEQ ID NOS:l,2)and L2 genes (SEQ ID 

20 NOS:5,6) are shown respectively in Figures 1A and IB. 

To determine whether the presence of rare codons in 
wild- type LI (SEQ ID NO:l) and L2 (SEQ ID NO: 5) genes 
(Table 1) inhibited translation, we synthesized the LI 
(SEQ ID NO: 3) and L2 (SEQ ID NO: 7) genes by using 

25 synonymous substitutions as shown. To construct the 
synthetic sequences, we synthesized 11 pairs of 
oligonucleotides for LI and 10 pairs of 
oligonucleotides for L2 . Each pair of 

oligonucleotides has restriction sites incorporated to 

30 facilitate subsequent cloning (Figures 1A and IB) . 

The degenerate oligonucleotides were used to amplify 
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LI and L2 sequences by PCR using a plasmid with BPV1 
genome as the template. The amplified fragments were 
cut with appropriate enzymes and sequentially ligated 
to pUC18 vector, producing pUCHBLl and pUCHBL2 . The 
5 synthetic LI (SEQ ID NO: 3) and L2 (SEQ ID NO: 7) 
sequences were sequenced and found to be error-free, 
and then sub-cloned into the mammalian expression 
vector pCDNA3 containing SV40 ori (Invitrogen) , giving 
expression plasmids pCDNA/HBLl and pCDNA/HBL2 . To 
10 compare expression of LI and L2 with that of ^ the 
original sequences, the wild type LI (SEQ ID NO:l) and 
L2 (SEQ ID NO: 5) genes were cloned into the pCDNA3 
vector, resulting in pCDNA/BPVLlwt and pCDNA/BPVL2wt . 

15 Immunofluorescence and Western Jblot 

stainingr 

For immunoblotting assays, Cos-1 cells in 
6-well plates were transfected with 2 /xg LI or L2 
expression plasmids using lipof ectamine (Gibco) . 3 6 

20 hrs after transf ection, cells were washed with 0.15M 
phosphate buffered 0.9% NaCl (PBS) and lysed in SDS 
loading buffer. The cellular proteins were separated 
by 10% SDS PAGE and blotted onto nitrocellulose 
membrane. The LI or L2 proteins were identified by 

25 electrochemiluminescence (Amersham, UK) , using BPV1 LI 
(DAKO) or L2-specific (17) antisera. For 

immunof luorescent staining, Cos-1 cells were grown on 
8 -chamber slides, transfected with plasmids, and 
fixed and permeabilised with 85% ethanol 36hr after 

30 transf ection. The slides were blocked with 5% milk-PBS 
and probed with LI or L2 -specific antisera, followed 
by FITC-conjugated anti-rabbit IgG (Sigma) . For GFP or 
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PGFP plasmid transfected cells, the cell were fixed 

with 4% buffered formaldehyde and viewed by epi- 
f luorescence microscopy. 



5 Northern blotting 

Cos cells transfected with various plasmids 
were used to extract cytoplasmic or total RNA using 
the QIAGEN RNeasy mini kit according to the 

supplier's handbook- Briefly, for cytoplasmic RNA 

10 purification, buffer RLN (50 mM Tris, pH 8.0, 14 0 mM 
NaCl, 1.5 mM MgCl 2 and 0.5% NP40) was directly added to 
monolayer cells and cells were lysed in 4 °C for 5 min. 
After the nuclei were removed by centrif ugation, 
cytoplasmic RNAs were purified by column. For total 

15 RNA extraction, the monolayer cells were lysed using 
buffer RLT supplied by the kit and RNA was purified by 
spin column. The purified RNAs were separated by 1.5% 
agarose gel in the presence of formaldehyde. The RNAs 
were then blotted onto nylon membrane and probed with 

20 (a) 1:1 mixed 5' -end labelled LI wt and HBL1 

fragments; (b) 1:1 mixed 5' -end labelled L2 wt and 
HBL2 fragments; (c) 1:1 mixed 5' end labelled GFP and 
PGFP fragments or (d) randomly labelled PAGDH 
fragment. The blots were washed extensively at 65 °C 
25 and exposed to X-ray films for three days. 

Re sults 

To test the hypothesis that the codon 
composition of the genes encoding the LI and L2 capsid 
30 proteins of papillomavirus (PV) contributes to their 
preferential expression in differentiated epithelial 
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cells, we produced synthetic BPV1 LI (SEQ ID NO: 3) and 
L2 (SEQ ID NO: 7) genes, substituting codons 
preferentially used in mammalian genes for the codons 
frequently present in the wild type BPV1 Ll and L2 
5 . sequences which are rare in eukaryotic genes (Figures 
1A, IB) . 

For the Ll gene, a total of 202 base 
substitutions were made in 196 codons, without 
changing the encoded amino acid sequence (Figure 1A) . 

10 This synthetic "humanized" BPV Ll gene (SEQ ID NO: 3) 
was designated HBL1 . In a similarly modified BPV1 L2 
gene (SEQ ID NO: 7) designated HBL2 , 3 03 bases were 
changed to substitute 2 90 less frequently used codons 
with the corresponding preferentially used codons. 

15 Using the synthetic HBL1 (SEQ ID NO: 3) and HBL2 (SEQ 
ID NO: 7) genes, we constructed two eukaryotic 
expression plasmids based on pCDNA3 , and designated 
pCDNA/HBLl and pCDNA/HBL2 . Similar expression 

plasmids, constructed with the wild type BPV1 Ll (SEQ 

2 0 ID NO:l) and BPV1 L2 (SEQ ID NO: 5) genes, were 

designated pCDNA/BPVLlwt and pCDNA/BPVL2wt , 

respectively. In each of these plasmids the SV40 ori 
allowed replication in Cos-1 cells, and the Ll or L2 
gene was driven by a strong constitutive CMV promoter. 

25 To compare the expression of the synthetic 

humanized and the wild type BPV1 Ll or BPV1 L2 genes, . 
we separately transfected Cos-1 cells with each of the 
Ll and L2 plasmids described above. Transfected cells 
were analyzed for expression of Ll (SEQ ID NO: 2, 4) or 

30 L2 (SEQ ID NO:6,8) protein by immunofluorescence 36 hr 
after transfection (Figures 2A and 3A) . ' Cells 
transfected with the pCDNA3 expression plasmid 
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containing the synthetic humanized LI (SEQ ID NO: 3) or 
L2 (SEQ ID NO: 7) genes were observed to produce large 
amounts of the corresponding protein, while cells 
transfected with expression plasmids with the wild 
5 type LI (SEQ ID NO:l) or L2 (SEQ ID NO: 5) sequences 
produced no detectable LI or L2 protein (Figures 2A 
and 3A, see nuclear staining of LI and L2 proteins) . 
To compare more accurately the expression of the 
different LI and L2 constructs, LI and L2 protein 

10 expression was assessed by immunoblot in Cos-1 cells 
transfected with the wild type or synthetic humanized 
BPVl LI or L2 pCDNA3 expression constructs (Figures 2B 
and 3B) . Large amounts of immunoreactive LI and L2 
proteins were expressed from the synthetic humanized 

15 LI (SEQ ID NO: 3) and L2 (SEQ ID NO: 7) sequences, but 
no LI or L2 protein was expressed from the wild type 
LI and L2 sequences (SEQ ID NO: 1,5). 

To establish whether the alterations to the 
primary sequence of the LI and L2 mRNA which resulted 

2 0 from the codon alterations also affected steady state 
expression of the corresponding message, mRNA was 
prepared from Cos-1 cells transfected with the various 
capsid protein gene constructs- Using GAPDH as an 
internal standard it was established by Northern blot 

25 that two to three times more modified than wild type 
LI mRNA, and similar levels of wild type and modified 
L2 mRNA were present in the cytoplasm of transfected 
cells (Figures 2C and 3C) . The amount of LI or L2- 
protein expressed per arbitrary unit of LI or L2 mRNA 

30 was at least 100 fold higher for the humanized gene 
constructs than for the natural gene constructs. 
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EXAMPLE 3 

Papillomavirus late protein translation in vitro 

Materials and Methods 

In vitro translation assay 

One microgram of each plasmid was 
incubated with 20 /xCi 35 S-methionine (Amersham) and 4 0 
fill T7 coupled rabbit reticulocyte or wheat germ 
lysates (Promega) . Translation was performed at 3 0 °C 
and stopped by adding SDS loading buffer. The LI 
proteins were separated by 10% SDS PAGE and examined 
by autoradiography. 

15 Production of aminoacyl-tRNA 

2.5 x 10" 4 M tRNA (Boehringer) was added to 
a 20 /xL reaction containing 10 mM Tris-acetate, 
pH.7.8, 44 mM KCl , 12 mM MgCl 2 , 9 mM -mercaptoethanol , 
3 8 mM ATP, 0.25 mM GTP and 7 piL rabbit reticulocyte 

20 extract. The reaction was carried out at 25 °C for 20 
min, and 30 /zL H 2 0 was added to the reaction to dilute 
the tRNAs to 1 x 10" 4 M. The aminoacyl - tRNAs were then 
aliquoted and stored at -70 °C. 



5 



10 



25 Results 

As the major limitation to expression of 
the wild type BPV LI and L2 genes appeared to be 
translational in our system we wished to test whether 
this limitation reflected a limited availability of 
3 0 the appropriate tRNA species for gene translation. As 
transient expression of the synthetic genes within 
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intact cells may be regulated by many factors, we 
tested our hypothesis in a cell free system using 
rabbit reticulocyte lysate (RRL) or wheat germ lysate 
to examine gene translation. Similar amounts of 
5 plasmids expressing the wild type or synthetic 
humanized BPV1 LI gene were added to a T7-DNA 
polymerase coupled RRL transcription/translation 
system in the presence of 35 S -methionine . After 2 0 
minutes, translated proteins were separated by SDS 

10 fc PAGE and visualized by autoradiography. Efficient 
translation of the modified LI gene was observed 
(Figure 4, top panel, lane 2), while translation of 
the wild type BPV1 LI sequence resulted in a weak 55 
kDa LI band (Figure 4, upper panel, lane 1). We 

15 reasoned that although the wild type sequence was not 
optimized for translation in RRL, some translation 
would occur as there would be no cellular mRNA species 
competing for the 1 rare 1 codons present in the wild 
type LI sequence. The above data suggest that the 

20 observed difference in efficiency of translation of 
the wild type and synthetic humanized LI genes is a 
consequence of limited availability of the tRNAs 
required for translation of the rare codons present in 
the wild type gene. We therefore expected that 

25 addition of excess tRNA to the in vitro translation 
system would overcome the inhibition of translation of 
the wild type LI gene. To address this question, 10" 5 
M aminoacyl- tRNAs from yeast were added into the RRL 
translation system, and LI protein synthesis was 

3 0 assessed. Introduction of exogenous tRNAs resulted in 
a dramatic improvement in translation of the wild type 
LI sequence, which now gave a yi.eld of LI protein 
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comparable to that observed with the synthetic 
humanized LI sequence (SEQ ID NO: 3) (Figure 4, top 
panel) . Enhancement of translation of the wild type 
LI gene (SEQ ID NO:l) by aminoacyl-tRNA was dose- 
5 dependent, with an optimum efficiency at 1CT 5 M tRNA . 
As addition of exogenous tRNA improved the yield of LI 
protein translated from the wild type LI gene sequence 
(SEQ ID NO:l), we assessed the speed of translation of 
wild type and humanized LI mRNA. Samples were 

10 collected from the translation mixture every 2 
minutes, starting at the 8th minute. Translation of LI 
(SEQ ID NO: 2, 4) from the wild type sequence (SEQ ID 
NO : 1) was much slower than from the humanized LI 
sequence (SEQ ID NO:3) (Figure 4 bottom panel), and 

15 the retardation of translation could be completely 
overcome by adding exogenous tRNA from commercially 
available yeast tRNA* Yeast tRNA was chosen in the 
above analysis because the codon usage in yeast is 
similar to that of papillomavirus (Table 1) . Addition 

2 0 of exogenous tRNA did not significantly improve the 

translation of the humanized LI gene (SEQ ID NO: 3), 
indicating that this sequence was optimized with 
regard to codon usage for the rabbit reticulocyte 
translation machinery (Figure 4, bottom panel). In 
25 separate experiments we established that wt LI 
translation could also be enhanced by liver tRNA 
(Figure 4 ) # and by tRNAs extracted from bovine skin 
epidermis, which presumably constitutes a mixture of 
tRNAs from differentiated and undifferentiated cells 

3 0 (data not shown) . 
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EXAMPLE 3 

Translation of wild type LI is efficient in wheat germ 
5 extract . 

To further test our hypothesis that tRNA 
availability is a determinant of expression of the 
wild type BPV1 LI gene (SEQ ID NO : 1 ) , we examined the 
translation of LI in a cell type in which a quite 

10 different set of tRNAs would be available. In a wheat 
germ translation system, wild type LI mRNA was 
translated as efficiently as humanized LI mRNA, and 
addition of exogenous aminoacyl- tRNAs did not improve 
the translation efficiency of either wild type or 

15 humanized sequences (Figure 4 bottom panel) . This 
indicated that in wheat germ there are sufficient of 
the tRNAs which are limiting for translation of wild 
type LI sequence in RRL to allow efficient LI 
translation. 

20 

EXAMPLE 4 

Modified late genes can be expressed in 
undifferentiated cells from papillomavirus promoter (s) 

25 While our data presented above indicates 

that translation is limiting for the production of 
BPV1 capsid proteins in our test system, these 
experiments were conducted in systems which are not 
truly representative of the viral late gene 

3 0 transcription from the BPV genome, in part because the 
genes were driven by a strong CMV promoter. We 
therefore wished to establish whether synthetic 
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humanized BPV capsid protein mRNA would be translated 
more efficiently than the wild type mRNA, if 
transcribed from the natural BPV1 promoter. This 
would establish whether translation was indeed one of 
5 the limiting factors for expression of BPV1 late genes 
driven from the natural cryptic late gene promoter in 
an undifferentiated cell. The BPV genome was cleaved 
at nt 4450 and 6958 with Bairtll/ Hindi II and the 
original LI (nt 4186-5595) and L2 (5068-7095) ORFs 
10 were removed. The synthetic humanized L2 gene (SEQ ID 
NO: 7) , together with an SV4 0 ori sequence to allow 
plasmid replication in eukaryotic cells, were inserted 
into the BPV genome lacking L1/L2 ORF sequences. This 
plasmid (Figure 5A) was designated pCICRl. A similar 

15 plasmid was constructed with wild type (SEQ ID NO: 5) 
rather than synthetic humanized L2 and designated 
pCICR2. Cos-1 cells were transfected with these 
plasmids and L2 protein expression examined by 
immunofluorescence of transfected cells. Synthetic 

20 humanized L2 (SEQ ID NO: 7), driven by the natural BPV- 
1 promoter, was efficiently expressed, whereas the 
wild type L2 sequence (SEQ ID NO: 5), driven from a 
similar construct, produced no immunoreactive L2 
protein (SEQ ID NO:6,8) (Figure 5B) . As 

25 undifferentiated cells supported the expression of the 
humanized L2 gene (SEQ ID NO: 7) but not the wild type 
L2 (SEQ ID NO: 5) expressed from the cryptic late BPV 
promoter, the results confirmed our earlier 
observations, from experiments using the CMV promoter. 

30 However, the plasmids tested here contained SV40 ori, 
designed to replicate the DNA in Cos cells. The 
increased copy number of the BPV1 L2 plasmids or the 
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transcriptional enhancing activity of the SV40 ori 
might explain in part the increased efficiency of 
expression of L2 in this experimental system when 
compared with infected skin. However, the marked 
5 difference in expression between the natural and 
humanized genes seen with a CMV promoter construct is 
still observed with the natural promoter. 



EXAMPLE 5 

10 

Substitution of papillomavirus -preferred codons 
prevents translation but not transcription of a non- 
papillomavirus gene in undifferentiated cells. 

materials? 3ud Methods 

15 Codon replacement in gfp gene 

To construct a modified gfp gene (SEQ ID 
NO:ll) using papillomavirus preferred codons (PGFP) , 6 
pairs of oligonucleotides were synthesized. Each pair 
of oligonucleotides has restriction sites incorporated 

20 and was used to amplify gfp using a humanized gfp gene 
(SEQ ID NO: 9) (GIBCO) as template. The PCR fragments 
were ligated into the pUC18 vector to produce pUCPGFP. 
The PGFP gene was sequenced, and cloned into BamHI 
site of the same mammalian expression vector, pCDNA3 , 

25 under the CMV promoter. The DNA and deduced amino 
acid sequences of the humanized GFP gene are shown in 
Figures 1C. Mutations introduced into the wild type 
gfp gene (SEQ ID NO: 9) to produce the Pgfp gene (SEQ 
ID NO: 11) are indicated above the corresponding 

30 nucleotides of the wild- type sequence. 
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Results 

To further confirm that codon usage can 
alter gene expression in mammalian cells, we made a 
further variant on a synthetic gfp gene modified for 
5 optimal expression in eukaryotic cells (Zolotukhin, et 
al., 1996. J. Virol. 70:4646-4654). In our variant, 
codons optimized for expression in eukaryotic cells 
were substituted by those preferentially used in 
papillomavirus late genes. Of 240 codons in the 
10 humanized gfp gene (SEQ ID NO: 9) , which expresses high 
levels of fluorescent protein in cultured cells, 156 
were changed to the corresponding papillomavirus late 
gene-preferred codons to produce a new gfp gene (SEQ 
ID NO: 11) designated Pgfp. Expression of Pgfp (SEQ ID 
15 NO: 11) in undifferentiated cells was compared with 
that of humanized gfp (SEQ ID NO: 9) . Cos-1 cells 
transfected with the humanized gfp (SEQ ID NO: 9) 
produced a bright fluorescent signal after 2 4 hrs, 
while cells transfected with Pgfp (SEQ ID NO: 11) 
20 produced only a faint fluorescent signal (Figure 6A) . 

To confirm that this difference reflected differing 
translational efficacy, gfp specific mRNA was tested 
in both transf ections and found not to be 
significantly different (Figure 6B.). Thus, codon 
25 usage and corresponding tRNA availability apparently 
determines the observed restriction of expression of 
PV late genes, and modification of codon usage in 
other genes similarly prevents their expression in 
undifferentiated cells . 
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EXAMPLE 6 

PGFP with papillomavirus -preferred codons is 
5 efficiently expressed in vivo in differentiated mouse 

keratinocytes . 

Materials and Methods 

Delivery of plasmid DNA into mouse skin by 

gene gun 

Fifty microgram of DNA was coated onto 25 
jag gold micro-carriers by calcium precipitation, 
following the manufacturer's instructions (Bio-Rad) . 
C57/bl mouse skin was bombarded with gold particles 
coated with DNA plasmid at a pressure of 600 psi . 
Serial sections were taken from the skin and examined 
for distribution of the particles, confirming that a 
pressure of 600 psi could deliver particles throughout 
the epidermis . 

Results 

20 Mice were shot with gold beads carrying 

PGFP DNA plasmid and, 24 hrs later, skin samples were 
cut from the site of DNA delivery and examined for 
expression of GFP protein (SEQ ID NO : 10,12 ). 
Fluorescence was detected mostly in upper keratinocyte 

25 layers, representing the differentiated epithelium, 
and was not seen in undifferentiated basal cells. In 
contrast, skin sections shot with the humanized GFP 
plasmid showed fluorescence in cells randomly 
distributed throughout the whole epidermis (Figure 7) . 

30 Although GFP-positive cells were rare in both PGFP- 
(SEQ ID NO: 11) and GFP- inoculated (SEQ ID NO: 9) mouse 
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skin, fluorescence was observed only in differentiated 
strata in the PGFP sample (SEQ ID NO:ll), whereas 
fluorescence was observed throughout the epidermis in 
GFP- inoculated (SEQ ID NO: 9) mouse skin. This result 
5 confirmed that the use of papillomavirus-pref erred 
codons resulted in the protein being expressed in an 
epi t hel ial different iat ion- dependent manner . 

EXAMPLE 7 

10 

Microinjection of yeast tRNA and wild type 
LI gene into cultured cells 

To test if yeast tRNA could facilitate 
expression of wild type BPV-1 LI (SEQ ID NO:l) (as 

15 yeast uses a similar set of codons to those observed 
in papillomavirus for its own genes) , 2 pL of mixtures 
containing tRNA (2 mg/mL) (purified yeast tRNA 
(Boehringer Mannheim) or bovine liver tRNA - control) 
and BPV LI DNA (2 /xg/mL) can be injected into CV-1 

20 cells (Lu and Campisi, 1992, Proc . Natl. Acad. 

Sci . U. S. A. 89 3889-3893) . The injected cells can 
then be cultured for 4 8 hrs at 37 °C and examined for 
expression of LI gene by standard immunof luoresence 
methods using BPV LI -specific antibody and quantified 

25 by FACS analysis (Qi et al 1996, Virology 216 35-45) . 

EXAMPLE 8 

Establishment of a cell line which can 
3 0 continuously produce HPV virus particles 

To produce infectious PV, various methods 
have been tried including the epithelial raft culture 
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system (Dollard et al 1992, Genes Dev 6 1131-1142), 
and cell lines containing BPV-1 episomal DNA, and 
infected by BPV-1 L1/L2 recombinant vaccinia (Zhou et 
al 1993, J". Gen. Virol. 74 763-768) or transfected by 
5 SFV RNA (Roden et al 1996, J. Virol. 70 5875-5883). 
The yield of particles is in each case low. In a 
reduction to practice of our discovery, synthetic BPV 
LI (SEQ ID NO: 3) and L2 genes (SEQ ID NO: 7) (as 
described in Example 1) can be used to produce 
10 infectious BPV in a cell line containing BPV-1 
episomal DNA. Fibroblast cell lines (CON/BPV) 

containing BPV-1 episomal DNA (Zhou et al 1993, J. 
Gen. Virol. 74 763-768) can be used for transfection 
of the synthetic BPV-1 LI (SEQ ID NO: 3) and L2 genes 
15 (SEQ ID NO: 7) under control of CMV promoter. BPV 

particles may then be purified from the cell lysate 
and the purified particles examined for the presence 
of BPV-1 genome . Standard methods such as 

transfection with lipof ectamine (BRL) and G418 
20 selection of transfected cells can be utilized to 
generate suitable transf ectants expressing humanized 
LI (SEQ ID NO: 3) and L2 (SEQ ID NO: 7) in the 
background of BPV-1 episomal DNA. Examination of LI 
and L2 protein expression can be performed using 
25 rabbit ant i -BPV LI or rabbit ant i -BPV L2 polyclonal 
antibodies. BPV particles can then be purified using 
our published methods (Zhou et al 1995 , Virology 214 
167-176) and can be characterized by electron 
microscopy and DNA blotting. The infect ivity of BPV 
30 particles isolated from the cultured cells may be 
tested in focus formation assays using C127 
fibroblasts . 
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EXAMPLE 9 

Method for extracting and measuring tRNA from tissues 
Tissue (lOOg) is homogenized in a Waring 
5 Blender with 150 mL of phenol (Mallinckrodt , 
Analytical Reagent, 88%) saturated with water (15:3) 
and 150 mL of 1.0 M NaCl , 0.005 M EDTA in 0.1 M Tris- 
chloride buffer, pH 7.5. The homogenate was spun 
for ten minutes at top speed in the International 
10 clinical centrifuge and the upper layer was carefully 
decanted off. To this aqueous layer, three volumes of 
95% ethanol were added. The resultant precipitate was 
spun down at top speed in the International clinical 
centrifuge and resuspended in 250 mL of 0.1 M 
15 Tris/chloride buffer, pH 7.5. This solution was added 
(flow rate of 15-20 drops per minute) to a column (2 x 
10 cm) of 2 g of DEAE-cellulose previously 
equilibrated with cold 0.1 M Tris -chloride buffer pH 
7.5. The column was then washed with 1 L of Tris- 
2 0 chloride buffer, pH 7.5 and the RNA eluted with 1.0 M 
NaCl in 0.1 M Tris-chloride buffer, pH 7.5. The first 
10 mL of NaCl solution were discarded as "hold-up." 
Sufficient salt solution (60-80 mL) was then collected 
until the optical density of the effluent was less 
25 than three at 260 nm. This solution was extracted 
twice with an equal volume of phenol saturated with 
water and twice with ether. To the aqueous solution 
containing the RNA, three volumes of 95% ethanol were 
added and the solution wag allowed to stand overnight 
30 in the cold. The precipitate was spun down and washed 
first with 80% and then twice with 95% ethanol and 
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dried in a vacuum. Approximately 60 mg of soluble 
RNA were obtained from a 100 -g lot of rat liver. 

Quant i tatingr tRNAs 
5 The following nylon membranes are used: 

Biodine A and B (PALL) . For the preparation of dot 
blots, the tRNA samples (from 1 pg to 5 ng) are 
denatured at 60 °C for 15 min in 1-5 /zL of 15% 
formaldehyde- lOx SSC (SSC is NaCl 0.3 M, tri-sodium 

10 citrate 0.03 M) . The samples are spotted in 1 /zL 
aliquots onto the membranes that have been soaked for 
15 min in deionized water and slightly dried between 
two sheets of 3 MM Whatman paper prior to the 
application of the samples. The tRNAs are fixed 

15 covalently (in the membranes by ultraviolet - 
irradiation (10 mm using an ultraviolet lamp at 254 
nm and 10 0 W strength at a distance of 2 0 cm) and the 
membranes are baked for 2-3 h at 80 °C. 

A 5 ! end labelled synthetic deoxyribo- 

20 oligonucleotide complementary to the A54-A73 sequence 
of the tRNA is used as a probe for the hybridization 
experiments. Labelling of the oligonucleotide is 
performed by direct phosphorylation of the 5 ! OH 1 
ended probe . 

25 For hybridisation experiments, the Un- 

irradiated membranes are first preincubated for 5 h at 
5 0 C in 5 0% deionized formamide, 5 x SSC, 1% SDS, 
0.04% Ficoll 0.04% polyvinylpyrrolidone and 250 ^L/mL 
of sonicated salmon sperm DNA using 5 mL of buffer for 

30 100 cm 2 of membrane. Hybridization is finally 

performed overnight at 50 °C in the above solution 
(2.5 mL/100 cm 2 ) where the labeled probe has been 
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added. After hybridization, the membranes are washed 
twice in 2 x SSC, 0.1% SDS for 5 min at room 
temperature, twice in 2 x SSC, 1% SDS for 30 mm at 60 
°C and finally in 0.1 x SSC . 0.1% SDS for 3 0 min at 
5 room temperature. To detect the hybridized probes the 
membranes are exposed for 16 h to Fuji XR film at 70 °C 
with an intensifying screen. 



10 



15 



20 



25 



30 



Sequence of tRNA probes 

The sequences of the tRNA probes are as 



follows : 



Ala GCR : 


5' - 


TAAGGACTGTAAGACTT 


(SEQ 


ID 


NO: 13) 


Arg^: 


5' - 


CGAGCCAGCCAGGAGTC 


(SEQ 


ID 


NO : 14 ) 


Asn^: 


5' - 


CTAGATTGGCAGGAATT 


(SEQ 


ID 


N0:15) 


AS P GAC : 


5' - 


• TAAGATATATAGATT AT 


(SEQ 


ID 


NO: 16) 


Csy TCC : 


5' - 


•AAGTCTTAGTAGAGATT 


(SEQ 


ID 


N0:17) 


Glu GAA : 


5' - 


- TATTTCTACACAGCATT 


(SEQ 


ID 


NO: 18) 


Gln^: 


5' - 


- CTAGGACAATAGGAATT 


(SEQ 


ID 


NO: 19) 


Gly 0 ^: 


5' - 


- TACTCTCTTCTGGGTTT 


(SEQ 


ID 


NO: 20) 


His 0 *: 


5' 


- TGCCGTGACTCGGATTC 


(SEQ 


ID 


NO:21) 


Ile ATC : 


5' 


- TAGAAATAAGAGGGCTT 


(SEQ 


ID 


NO: 22) 


Leu CTA : 


5' 


- TACTTTTATTTGGATTT 


(SEQ 


ID 


NO: 23) 


Leu 0 ": 


5' 


- TATTAGGGAGAGGATTT 


(SEQ 


ID 


NO:24) 


Lys*^: 


5' 


- TCACTATGGAGATTTTA 


' (SEQ 


ID 


NO: 25) 


Lys"" 3 : 


5' 


- CGCCCAACGTGGGGCTC 


(SEQ 


ID 


NO: 26) 


Met elon9 


5' 


- TAGTACGGGAAGGATTT 


(SEQ 


ID 


NO:27) 


Phe TTC : 


5' 


- TGTTTATGGGATACAAT 


(SEQ 


ID 


NO: 28) 


Pro CCA : 


5' 


- TCAAGAAGAAGGAGCTA 


(SEQ 


ID 


NO:29) 


Pro ccl : 


5' 


- GGGCTCGTCCGGGATTT 


(SEQ 


ID 


NO: 30) 


Ser^ : 


5' 


- ATAAGAAAGGAAGATCG 


(SEQ 


ID 


NO: 31) 


Thr ACA : 


5' 


- TGTCTTGAGAAGAGAAG 


(SEQ 


ID 


NO: 32) 


Tyr TAC : 


5' 


- TGGTAAAAAGAGGATTT 


(SEQ 


ID 


i NO: 33) 
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Val GTA : 5' -TCAGAGTGTTCATTGGT (SEQ ID NO: 34) 

EXAMPLE jo 

5 Comparison of the relative abundance of tRNA species 

in undifferentiated and differentiated keratinocytes 

Materials and Methods 

, Isolation of epidermal cells 

2 -day old mice were killed and their skins 
10 removed. The skins were digested with 0.25% trypsin 
PBS at 4 °C overnight. The epidermis was separated 
from the dermis using forceps and minced with scissors 
in 10% FCS DMEM medium. The cell suspension was first 
. filtered through a 1 mm and then a 0.2 mm nylon net. 
15 The cell suspension was then pelleted and washed twice 
with PBS. 

Density gradient centrif ligation 
The keratinocytes were resuspended in 3 0% 
20 Percoll and separated by centrif ugation through a 
discontinuous Percoll gradient (1.085, 1.075 and 1.050 
g/mL) at 12 00 x g at room temperature for 2 5 min. The 
cells were then washed with PBS and used to extract 
tRNA. 

25 

tRNA purification 

The cells were lysed in 5 mLi of lysis 
buffer (0.2 M NaOH, 1% SDS) for 10 min at room 
temperature. The lysate was neutralized with 5 mL of 
30 3.0 M potassium acetate (pH 5.5). After 
centrif ugation, the supernatant was diluted with 3 
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volumes of 100 mM Tris (pH 7.5) and added to a DEAE 
column equilibrated with 100 mM Tris (pH 7.5). An 
equal volume of isopropanol was added to the aqueous 
solution containing tRNA, and the solution was allowed 
5 to stand overnight at 4 °C. The tRNA was spun down 
and washed with 75% ethanol, then dissolved in RNase- 
free water. 

tRNA blotting* 

10 10 ng of each tRNA sample in 1 jzL was 

denatured in 60 °C for 15 min in 4 j*L formaldehyde and 
5 /zL 2 0 x SSC. The samples were spotted in 1 jzL 
aliquots onto charged nylon membrane (Amersham) , and 
the tRNAs were fixed with UV and probed with 32 P- 

15 oligonucleotides . 

Results 

Comparison of the abundance of the tRNA 
species in undifferentiated and differentiated 
keratinocytes showed that the levels of some tRNA 
populations changed dramatically. For example, the 
levels of tRNAs specific for Ala GCA , Leu 0 ", Leu CTA were 
increased in differentiated cells while tRNAs for 
Arg CGA , Pro CCI , Asn^ were more abundant in 
undifferentiated keratinocytes (see Table 2) . 

GENERAL DISCUSSION 

In the present specification the inventors 
have confirmed that one determinant of the efficiency 
3 0 of translation of a gene in mammalian cells is its 
codon composition. This observation has commonly been 



20 



25 
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made when genes from prokaryotic organisms have been 
expressed in eukaryotic cells (Smith, D. W. , 1996, 
Biotechnol. Prog. 12 : 417 -422) . The present inventors 
have also presented evidence that mRNA encoding the 
5 capsid genes of papillomavirus are not effectively 
translated in cultured eukaryotic cells, apparently 
because tRNA availability is rate limiting for 
translation, and that the block to PV late gene 
translation in eukaryotic cells in culture can be 

10 . overcome by altering the codon usage of the late genes 
to match the consensus for mammalian genes, or 
alternatively by providing exogenous tRNAs . 
Alterations to mRNA secondary structure or protein 
binding (Sokolowski, et al . , 1998, J. Virol. 72:1504- 

15 1515) as a consequence of the changes to the primary 
sequence of the PV capsid genes might contribute to 
the observed differences in efficiency of translation 
of the natural and modified PV capsid gene mRNAs in 
cultured cells. However, the enhancement of 

20 translation of the natural but not the modified mRNA 
that was observed after addition of tRNA in a 
mammalian in vitro translation system, which was not 
observed in a plant translation system, strengthens 
the argument that tRNA availability is rate limiting 

25 for translation of the natural gene in mammalian 
cells. A shortage of critical tRNAs could result in 
slowed elongation of the nascent peptide or premature 
termination of translation (Oba, et al . , 1991, 
Biochimie 73:1109-1112). Slowed elongation appears to 

30 be the major consequence for the PV late gene. 

Analysis of codon usage in the PV genome shows that PV 
late genes use many codons that mammalian cells rarely 

SUBSTITUTE SHEET (RULE 26) 



WO 99/02694 g 2 PCT/AU98/00530 

use. For example, PV frequently uses UUA for leucine, 
CGU for arginine, ACA for threonine, and AUA for 
isoleucine, whereas these codons are significantly 
less often used in mammalian genes. In contrast, 
5 papillomavirus late genes can be expressed efficiently 
in yeast (Jansen, et al . , 1995, Vaccine 13:1509-1514) 
(Sasagawa, et al., 1995, Virology 206:126-135) and the 
codon composition of yeast and papillomavirus genes 
are similar (Table 1) . An apparent exception is that 

10 PV LI genes can be efficiently expressed in insect 
cells (Kirnbauer, et al., 1992, Proc. Natl. Acad. Sci . 
USA 89:12180-12184) using recombinant baculovirus, or 
in various undifferentiated mammalian cells using 
recombinant vaccinia (Zhou, et al. , 1991, Virology 

15 185:251-257). As infection with vaccinia or 

baculovirus down regulates cellular protein synthesis, 
the efficient expression of the LI capsid proteins 
under these circumstances may occur because less 
cellular mRNA is available in a virus infected cell to 

20 compete with the LI mRNA for the rarer tRNAs . 

Codon composition could be a more general 
determinant of gene expression within different stages 
of differentiation of the same tissue. Although the 
genetic code is essentially universal, different 

25 organisms show differences in codon composition of 
their genes, while the codon composition of genes 
tends to be relatively similar for all genes within 
each organism, and matched to the population of iso- 
tRNAs for that organism (Ikemura, T. , 1981, J*. Mol . 

30 Biol. 146:1-21). However, populations of tRNAs in 
differentiating and neoplastic cells are different 
(Kanduc, D. , 1997, Arch. Biochem. Biophys . 342:1-6; 
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Yang, and Comb, 1968, J". Mol . Biol. 31:138-142; Yang, 
and Novelli, 1968, Biochem. Biophys . Res. Commun. 31: 
534-53 9) and the tRNA populations also vary in cells 
growing under different growth conditions (Doi, et 
5 al., 1968, J. Biol. Chem. 243:945-951). Accordingly, 
the inventors believe that codon composition and tRNA 
availability together provide a primitive mechanism 
for spatial and/or temporal regulation of gene 
expression. It is well recognized that the G+C 
10 content of many dsDNA viruses, a crude marker for 
viral gene codon composition, is markedly different 
from the G+C content of the DNA of the cells they 
infect (Strauss, et al . , 1995, "Virus Evolution" in 
Virology (eds. Fields, B . N. , et al.), Lipipincott- 
15 Raven, Philadelphia, pp 153-171) . Viruses may 

therefore have evolved to take advantage of codon 
composition to regulate their own program of gene 
expression, perhaps to avoid expression of lethal 
quantities of viral proteins in undifferentiated cells 
2 0 where the virus utilizes the cellular machinery to 
replicate its genome. 

As the inventors' observations represent an 
apparently novel mechanism of regulation of gene 
translation within a single tissue, it is relevant to 
25 consider how this relates to previously proposed 
hypotheses for the restriction of expression of PV 
late genes to differentiated epithelium. A number of 
explanations have been proposed for the observation 
that PV late genes are only effectively expressed in 
30 differentiated epithelium. Reduced late gene 

transcription may reflect dependence of transcription 
from the late promoter on transcription factors 
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expressed only in differentiated epithelium, or may 
alternatively be due to suppression of late promoter 
transcription by viral (Stubenrauch, et al . , 1996, J. 
Virol. 70:119-126) or cellular gene products expressed 
5 in undifferentiated cells. The "late" promoters of 
HPV31b and of HPV5 (Haller, et al . , 1995, Virology 
214:245-255; Hummel, et al . , 1992, J". Virol. 66:6070- 
6080) are described as differentiation dependent, 
although the search for relevant transcription control 

10 factors in differentiated kerat inocytes by 
conventional footprint ing and DNA binding studies has 
to date been unrewarding. Our data show that capsid 
proteins are not translated from PV LI and L2 mRNAs in 
cells transfected with CMV promoter-based expression 

15 vectors (Fig. 2), suggesting that in addition to any 
transcriptional controls that may exist that there is 
a post -transcriptional block to capsid protein 
synthesis in undifferentiated cells. Sequences 
resembling 5 1 splice donor sites exist within LI or L.2 

20 mRNA or within flanking untranslated message which are 
inhibitory to transcription of genes with which they 
are associated (Kennedy, et al., 1991, J". Virol. 
65:2093-2097) (Furth, et al . , 1994, Mol - Cell. Biol. 
14:5278-5289). Other AU rich sequences in LI or L2 

25 mRNA promote mRNA degradation (Sokolowski, et al . , 
1997, Oncogene 15:2303-2319). These mechanisms 

inhibiting LI and L2 expression in undifferentiated 
cells have yet to be shown to be inactive in 
differentiated epithelium, to explain the successful 

30 translation of late genes in this tissue. 

Because inhibitory RNA sequences within the 
LI coding sequence could have been rendered non- 



SUBST1TUTE SHEET (RULE 26) 



WO 99/02694 65 PCT/AU98/00530 

functional by the systematic codon substitution 
employed in the experiments described herein and the 
untranslated inhibitory sequences were not included in 
the inventors' test system, the respective roles of 
5 inhibitory sequences and codon mismatch in suppression 
of PV late gene expression in cultured mammalian cells 
cannot be determined. However, regulatory sequences 
promoting RNA degradation or inhibiting translation 
are presumed to act through interaction with nuclear 
10 or cytoplasmic proteins (Sokolowski, et al . , 1998, J. 

Virol. 72:1504-1515), and inefficient translation of 
native sequence LI mRNA was observed in a cell free 
translation system from anucleate cells, demonstrating 
that codon composition of the PV late genes must play 
15 some role in regulation of PV late gene translation. 

Further evidence supporting the hypothesis 
that codon composition is an important determinant of 
PV caps id gene expression was gathered from an 
analysis of the 84 PV LI sequences currently available 
20 in Genebank. The codon composition of the LI genes, 
and particularly the frequency of usage of the rarer 
codons, was essentially the same across all the 
published sequences (data not shown) as would be 
predicted by the similar G+C content of the 
25 papillomavirus genomes. The PV LI gene is relatively 
conserved at the amino acid level, showing 60 - 80% 
amino acid homology between PV genotypes, as might be 
expected by the constraints on capsid protein 
function. There are, however, no obvious constraining 
30 influences on the codon composition of the PV late 
genes beyond those of the inventors' hypothesis, as 
the late gene region does not code for other genes, 
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either in other reading frames or on the complementary 
DNA strand, and has no known cis acting regulatory- 
functions. If codon composition of the capsid genes 
were not important for PV function, a considerable 
5 heterogeneity of codon usage might therefore be 
expected, given the evolutionary diversity of PVs 
(Chan, et al. 1995, J". Virol. 69:3074-3083). 

Taken together, the data and evidence 
outlined herein makes a strong case that codon usage 
10 is a significant determinant of expression of PV late 
genes in undifferentiated and differentiated 
epithelial cells, and that this observation is 
generalizable. The relative role of message 

instability and codon mismatch in determining 
15 expression in differentiated tissues will require 
comparisons of transcriptional activity and 
translation of the LI or L.2 genes driven from strong 
constitutive promoters in differentiated and 
undifferentiated epithelium. Such work should now be 
20 feasible using either transgenic technology or 
keratinocyte raft cultures. 

Although mechanisms of transcriptional 
regulation of PV LI or L2 gene expression in the 
superficial layer of differentiated epithelium have 
25 been proposed (Zeltner et al . , 1994, J". Virol. 

68:3620; Brown, et al . , 1995, Virology 214:259; Stoler 
et al., 1992, Hum. Pathol. 23:117; Hummel et al . , 
1995, J. Virol. 69:3381; Haller et al . , 1995, Virology 
214:245; Barksdale and Baker, 1993, J". Virol. 
30 67:5605), measurable PV late gene mRNA is not always 
associated with production of late proteins (Zeltner 
et al., 1994, supra; Ozbun and Meyers, 1997, J\ Virol. 
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71:5161) , and the data presented here suggest that 
translation regulation may play a major part in 
controlling PV late gene expression. This observation 
has implications as herein described for the 
5 regulation of expression of genes related to the 
specialised functions of any differentiated tissue, 
and also for targeting of expression of therapeutic 
genes to such tissue while avoiding the potentially 
deleterious consequences of expression of the 
10 exogenous gene in a self renewing stem cell 
population. 

The present invention has been described in 
terms of particular embodiments found or proposed by 
the present inventors to comprise preferred modes for 

15 the practice of the invention. Those of skill in the 
art will appreciate that, in light of the present 
disclosure, numerous modifications and changes may be 
made in the particular embodiments exemplified without 
departing from the scope of the invention . All such 

2 0 modifications are intended to be included within the 
scope of the appended claims . 
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TABLE 1 

The codon usage data for human , cow yeast 

and wheat proteins are derived from published 

5 results (18). The BPV1 data are from the sequences in 
the Genbank database . 

TABLE 2 

Each iso-acceptor tRNA with anticodon shown 
10 as superscript are shown on top row. The " + " 

indicates the abundance of tRNA wherein each " + " 
indicates about 10 fold increase. 



SUBSTITUTE SHEET (RULE 26) 



WO 99/02694 



69 



PCT/AU98/00530 



table; i 

Frequency (per one thousand) 
organisms . 



■TABLES 



of codon usage for individual 



Amino 


Co dons 


Human 


Cow 


Yeast 


wneat 


BPVXi 


acids 












L2 


ARG 


CGA 


5 . 4 


5 . 5 


2 . 3 


2 . 3 


7 . 2 




CGC 


11 . 3 


12 . 2 


2 . 0 


7 . 5 


4 . 1 




CGG 


10 . 4 


11 . 2 


1 . 1 


4 . 6 


5 . 1 




CGU 


4.7 


3 . 7 


7 . 5 


1 . 1 


10 . 4 




AGA 


9 . 9 


9 . 9 


24 . 0 


4 . 1 


14 . 4 




AGG 


11.1 


11 . 4 


7 . 5 


7 . 1 


9 . 3 


T TT 1 ! T 


LUA 


b . 2 


4 . 9 


11.8 


12 . 1 


18 . 6 






1 O Q 




4.1 


lo . 6 


6 . 2 








4b . D 


O.J 


1 c c 


lb . b 




r*«TTTT 




1U • o 


y . O 


D . O 


Z\J • / 








ft . V 






1 A C 




TTTTO 


11 0 




O O 1 


1 j . j 


ID • D 


SER 


UCA 


9.3 


7.6 


15.6 


14 . 6 


16.6 




UCC 


17.7 


17.6 


14.4 


10 . 1 


11.4 




UCG 


4.2 


4.5 


6.5 


9.6 


6.2 




UCU 


13 .2 


11.2 


24 .6 


14 . 8 


15.5 




AGC 


18 . 7 


18 . 7 


7.1 


12 . 8 


12 .4 




AGU 


9.4 


8 . 6 


11 . 7 


12 . 9 


21.7 


THR 


ACA 


14 .4 


11.4 


15.6 


4.6 


37.3 




ACC 


23.0 


21.1 


13.9 


15.9 


19. 7 




ACG 


6.7 


7.8 


6.7 


4.5 


4.1 




ACU 


12 .7 


9.6 


22.0 


11.8 


28.0 
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Amino 


Codons 


Human 


Cow 


acids 








PRO 


CCA 


14 .6 


12 . 0 




CCC 


20.0 


19.2 




CCG 


6.5 


7.9 




ecu 


15 . 5 


14 . 6 


ALA 


GCA 


14 . 0 


13.1 




GCC 


29.1 


35.8 




GCG 


7.2 


9.3 




GCU 


19.6 


19. 1 


GLY 


GGA 


17 . 1 


16.2 




GGC 


25 .4 


28. 1 




GGG 


17.3 


19.2 




GGU 


11.2 


11. 8 


VAL 


GUA 


5 . 9 


5.1 




GUC 


16.3 


18.4 




GUG 


30 . 9 


32. 9 




GUU 


10 .4 


9.9 


LYS 


AAA 


22 .2 


21.6 




AAG 


34 . 9 


37.1 


ASN 


AAC 


22 .6 


22.4 




AAU 


16.6 


12. 5 


GLN 


CAA 
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Yeast Wheat BPVI*1/ 

L2 
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TABLJB 2 

tRNA population changes as KC starts to differentiate . 



tRNA 
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+++ 
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+++ 


+++ 


++ 
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+ 


++ 
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WHAT IS CIAIE4EP XS: 

1. A synthetic nucleic acid sequence capable 
of selectively expressing a protein in a target cell 
5 or tissue of a mammal, wherein said selective 
expression is effected by replacing at least one 
existing codon of a parent nucleic acid sequence with 
a synonymous codon to form said synthetic nucleic acid 
sequence . 

10 2. The nucleic acid sequence of claim 1, 

wherein said synonymous codon corresponds to an iso- 
tRNA which, when compared to an iso-tRNA corresponding 
to the at least one existing codon, is in higher 
abundance in the target cell or tissue relative to one 

15 or more other cells or tissues of the mammal. 

3. The nucleic acid sequence of claim 1, 
wherein said synonymous codon corresponds to an iso- 
tRNA which, when compared to an iso-tRNA corresponding 
to the at least one existing codon, is in higher 

20 abundance in the target cell or tissue relative to a 
precursor cell or tissue. 

4. The nucleic acid sequence of claim 1, 
wherein said synonymous codon corresponds to an iso- 
tRNA which, when compared to an iso-tRNA corresponding 

25 to the at least one existing codon, is in higher 
abundance in the target cell or tissue relative to a 
cell or tissue derived therefrom. 

5. The nucleic acid sequence of claim 1, 
wherein said synonymous codons for selective 

30 expression of said protein are selected from the group 
consisting of gca (Ala) , cuu (Leu) and cua (Leu) , and 
said target is a differentiated cell. 
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6. The nucleic acid sequence of claim 5, 
wherein said differentiated cell is a differentiated 
keratinocyte. 

7. The nucleic acid sequence of any one of 
5 claims 2 to 4 , wherein said corresponding iso-tRNA in 

said target cell or tissue is at a level which is at 
least 110% , preferably at least 200%, more preferably 
at least 500%, and most preferably at least 1000%, of 
that expressed in the or each other cell or tissue of 
10 the mammal. 

8. The nucleic acid sequence of claim 1, 
wherein the synonymous codon may be selected from the 
group consisting of (1) a codon used at relatively 
high frequency by genes, preferably highly expressed 

15 genes, of the target cell or tissue, (2) a codon used 
at relatively high frequency by genes, preferably 
highly expressed genes, of the or each other cell or 
tissue, (3) a codon used at relatively high frequency 
by genes, preferably highly expressed genes, of the 

20 mammal, (4) a codon used at relatively low frequency 
by genes of the target cell or tissue, (5) a codon 
used at relatively low frequency by genes of the or 
each other cell or tissue, (6) a codon used at 
relatively low frequency by genes of the mammal, (7) a 

25 codon used at relatively high frequency by genes of 
another organism, and (8) a codon used at relatively 
low frequency by genes of another organism. 

9. The nucleic acid sequence of claim 1, 
wherein the at least one existing codon and the 

3 0 synonymous codon are selected such that said protein- 
is expressed from said synthetic nucleic acid sequence 
in said target cell or tissue at a level which is at 
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least 110%, preferably at least 200%, more preferably 
at least 500%, and most preferably at least 1000%, of 
that expressed from said parent nucleic acid sequence 
in said target cell or tissue. 
5 10. A method for selectively expressing a 

protein in a target cell or tissue of a mammal, 
wherein said selective expression is effected by 
replacing at least one existing codon of a parent 
nucleic acid sequence with a synonymous codon to form 
10 said synthetic nucleic acid sequence. 

11. The method of claim 10, wherein said method 
is further characterized the steps of: 

(a) replacing at least one existing codon 
of a parent nucleic acid sequence encoding said 

15 protein with a synonymous codon to produce a synthetic 
nucleic acid sequence having altered translational 
kinetics compared to said parent nucleic acid sequence 
such that said protein is selectively expressible in 
said target cell or tissue; 

2 0 (b) administering to the mammal and 

introducing into said target cell or tissue, or a 
precursor cell or precursor tissue thereof, said 
synthetic nucleic acid sequence operably linked to one 
or more regulatory nucleotide sequences; and 

25 (c) selectively expressing said protein 

in said target cell or tissue. 

12. The method of claim 11 further including, 
prior to step (a) : 

(i) measuring relative abundance of 
30 different iso-tRNAs in said target cell or tissue, and 
in one or more other cells or tissues of the mammal; 
and 
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(ii) identifying said at least one 
existing codon and said synonymous codon based on said 
measurement, wherein said synonymous codon corresponds 
to an iso-tRNA which, when compared to an iso-tRNA 
5 corresponding to the existing codon, is in higher 
abundance in said target cell or tissue relative to 
the or each other cell or tissue of the mammal. 

13. The method of claim 12, wherein step (ii) 
is further characterized in that said synonymous codon 

10 corresponds to an iso-tRNA which, when compared to an 
iso-tRNA corresponding to the at least one existing 
codon, is in higher abundance in the target cell or 
tissue relative to a precursor cell or tissue. 

14. The method of claim 12, wherein step (ii) 
15 is further characterized in that said synonymous codon 

corresponds to an iso-tRNA which, when compared to an 
iso-tRNA corresponding to the at least one existing 
codon, is in higher abundance in the target cell or 
tissue relative to a cell or tissue derived therefrom. 

20 15. The method of claim 11 further including, 

prior to step (a) *, identifying said at least one 
existing codon and said synonymous codon based on 
respective relative frequencies of particular codons 
used by genes selected from the group consisting of 

25 (I) genes of the target cell or tissue, (II) genes of 

the or each other cell or tissue , ( III ) genes of the 
mammal, and (IV) genes of another organism. 

16. A method for expressing a protein in a 
target cell or tissue from a first nucleic acid 

3 0 sequence including the steps of: 

introducing into said target cell or 
tissue, or a precursor cell or precursor tissue 
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thereof, a second nucleic acid sequence encoding at 
least one isoaccepting transfer RNA wherein said 
second nucleic acid sequence is operably linked to one 
or more regulatory nucleotide sequences, and wherein 
5 said at least one isoaccepting transfer RNA is 
normally in relatively low abundance in said target 
cell or tissue and corresponds to a codon of said 
first nucleic acid sequence. 

17. A method for producing a virus particle in 

10 a cycling eukaryotic cell, said virus particle 
comprising at least one protein necessary for assembly 
of said virus particle, wherein said at least one 
protein is not expressed in said cell from a parent 
nucleic acid sequence at a level sufficient to permit 

15 virus assembly therein, said method including the 
steps of: 

(a) replacing at least one existing codon 
of said parent nucleic acid sequence with a synonymous 
codon to produce a synthetic nucleic acid sequence 

20 having altered translational kinetics compared to said 
parent nucleic acid sequence such that said at least 
one protein is expressible from said synthetic nucleic 
acid sequence in said cell at a level sufficient to 
permit virus assembly therein; 

25 (b) introducing into said cell or a 

precursor thereof said synthetic nucleic acid sequence 
operably linked to one or more regulatory nucleotide 
sequences ; and 

(c) expressing said at least one protein 

30 in said cell in the presence of other viral proteins 
required for assembly of said virus particle to 
thereby produce said virus particle. 
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18. A method for producing a virus particle in 
a cycling cell, said virus particle comprising at 
least one protein necessary for assembly of said virus 
particle, wherein said at least one protein is not 

5 expressed in said cell from a parent nucleic acid 
sequence at a level sufficient to permit virus 
assembly therein, and wherein at least one existing 
codon of said parent nucleic acid sequence is rate 
limiting for the production said at least one protein 
10 to said level, said method including the step .of 
introducing into said cell a nucleic acid sequence 
capable of expressing therein an isoaccepting transfer 
RNA specific for said at least one codon. 

19. A vector comprising a nucleic acid sequence 
15 according to any of claims 1 to 9 wherein said 

synthetic nucleic acid sequence is operably linked to- 
one or more regulatory nucleic acid sequences. 

20. A pharmaceutical composition comprising a 
nucleic acid sequence according to any of claims 1 to 

20 9 together with a pharmaceutically acceptable carrier. 

21. A pharmaceutical composition comprising a 
vector according to claim 19 together with a 
pharmaceutically acceptable carrier. 

22. A cell or tissue comprising therein a 
25 nucleic acid sequence according to any of claims 1 to 

9. 

23. A cell or tissue comprising therein a 
vector according to claim 19. 

24. A cell or tissue resulting from a method 
30 according to any one of claims 10 to 18. 

25. Virus particles produced from a method 
according to claims 17 or 18. 
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<130> Selective Expression 

<140> PCT/AU98/00530 
<141> 1998-07-09 

<150> P07765 
<151> 1997-07-09 

<150> P09467 
<151> 1997-09-11 

<160> 34 

<170> Patentln Ver. 2.0 

<210> 1 
<211> 1488 
<212> DNA 

<213> Bovine papillomavirus type 1 

<220> 

<221> CDS 

<222> (1) . . (1488) 

<220> 

<223> LI open reading frame (wild- type) 
<400> 1 

atg gcg ttg tgg caa caa ggc cag 
Met Ala Leu Trp Gin Gin Gly Gin 
1 5 

gta age aag gtg ctt tgc agt gaa 
Val Ser Lys Val Leu Cys Ser Glu 
20 

ttt tat cat gca gaa acg gag cgc 
Phe Tyr His Ala Glu Thr Glu Arg 
35 40 

tac cca gtg tct ate ggg gee aaa 
Tyr Pro Val Ser lie Gly Ala Lys 
50 55 

cag tat agg gta ttt aaa ata caa 
Gin Tyr Arg Val Phe Lys lie Gin 
65 "* 70 



aag ctg tat etc cct cca acc cct 48 
Lys Leu Tyr Leu Pro Pro Thr Pro 
10 15 



acc tat gtg caa aga aaa age att 
Thr Tyr Val Gin Arg Lys Ser He 
25 30 



96 



ctg eta act ata gga cat cca tat 144 
Leu Leu Thr He Gly His Pro Tyr 
45 

act gtt cct aag gtc tct gca aat 192 
Thr Val Pro Lys Val Ser Ala Asn 
60 

eta cct gat ccc aat caa ttt gca 240 
Leu Pro Asp Pro Asn Gin Phe Ala 
75 80 
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eta cct gac agg act gtt cac aac cca agt aaa gag egg ctg gtg tgg 2 88 
Leu Pro Asp Arg Thr Val His Asn Pro Ser Lys Glu Arg Leu Val Trp 
85 90 95 

cca gtc ata ggt gtg cag gtg tec aga ggg cag cct ctt gga ggt act 336 
Pro Val lie Gly Val Gin Val Ser Arg Gly Gin Pro Leu Gly Gly Thr 
100 105 110 

gta act ggg cac ccc act ttt aat get ttg ctt gat gca gaa aat gtg 3 84 
Val Thr Gly His Pro Thr Phe Asn Ala Leu Leu Asp Ala Glu Asn Val 
115 120 125 

aat aga aaa gtc acc ace caa aca aca gat gac agg aaa caa aca ggc 43 2 
Asn Arg Lys Val Thr Thr Gin Thr Thr Asp Asp Arg Lys Gin Thr Gly 
130 135 140 

eta gat get aag caa caa cag att ctg ttg eta ggc tgt acc cct get 4 80 
Leu Asp Ala Lys Gin Gin Gin lie Leu Leu Leu Gly Cys Thr Pro Ala 
145 150 155 160 

gaa ggg gaa tat tgg aca aca gee cgt cca tgt gtt act gat cgt eta 528 
Glu Gly Glu Tyr Trp Thr Thr Ala Arg Pro Cys Val Thr Asp Arg Leu 
165 170 175 

gaa aat ggc gee tgc cct cct ctt gaa tta aaa aac aag cac ata gaa 576 
Glu Asn Gly Ala Cys Pro Pro Leu Glu Leu Lys Asn Lys His lie Glu 
180 185 190 

gat ggg gat atg atg gaa att ggg ttt ggt gca gee aac ttc aaa gaa 624 
Asp Gly Asp Met Met Glu lie Gly Phe Gly Ala Ala Asn Phe Lys Glu 
195 200 205 

att aat gca agt aaa tea gat eta cct ctt gac att caa aat gag ate 672 
lie Asn Ala Ser Lys Ser Asp Leu Pro Leu Asp lie Gin Asn Glu lie 
210 215 220 

tgc ttg tac cca gac tac etc aaa atg get gag gac get get ggt aat 720 
Cys Leu Tyr Pro Asp Tyr Leu Lys Met Ala Glu Asp Ala Ala Gly Asn 
225 230 235 240 

age atg ttc ttt ttt gca agg aaa gaa cag gtg tat gtt aga cac ate 768 
Ser Met Phe Phe Phe Ala Arg Lys Glu Gin Val Tyr Val Arg His lie 
245 250 255 

tgg acc aga ggg ggc teg gag aaa gaa gee cct acc aca gat ttt tat 816 
Trp Thr Arg Gly Gly Ser Glu Lys Glu Ala Pro Thr Thr Asp Phe Tyr 
260 265 270 

tta aag aat aat aaa ggg gat gee acc ctt aaa ata ccc agt gtg cat 864 
Leu Lys Asn Asn Lys Gly Asp Ala Thr Leu Lys lie Pro Ser Val His 
275 280 285 

ttt ggt agt ccc agt ggc tea eta gtc tea act gat aat caa att ttt 912 
Phe Gly Ser Pro Ser Gly Ser Leu Val Ser Thr Asp Asn Gin lie Phe 
290 295 300 
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aat egg ccc tac tgg eta ttc cgt gec cag ggc atg aac aat gga att 

Asn Arg Pro Tyr Trp Leu Phe Arg Ala Gin Gly Met Asn Asn Gly lie 

305 310 315 320 

gca tgg aat aat tta ttg ttt tta aca gtg ggg gac aat aca cgt ggt 

Ala Trp Asn Asn Leu Leu Phe Leu Thr Val Gly Asp Asn Thr Arg Gly 

325 330 335 



960 



1008 



act aat ctt acc ata agt gta gec tea gat gga acc cca eta aca gag 
Thr Asn Leu Thr He Ser Val Ala Ser Asp Gly Thr Pro Leu Thr Glu 
340 345 350 



1056 



tat gat age tea aaa ttc aat gta tac cat aga cat atg gaa gaa tat 
Tyr Asp Ser Ser Lys Phe Asn Val Tyr His Arg His Met Glu Glu Tyr 
355 360 365 



1104 



aag eta gee ttt ata tta gag eta tgc tct gtg gaa ate aca get caa 
Lys Leu Ala Phe He Leu Glu Leu Cys Ser Val Glu lie Thr Ala Gin 
370 375 380 

act gtg tea cat ctg caa gga ctt atg ccc tct gtg ctt gaa aat tgg 
Thr Val Ser His Leu Gin Gly Leu Met Pro Ser Val Leu Glu Asn Trp 
385 390 395 400 

gaa ata ggt gtg cag cct cct acc tea teg ata tta gag gac acc tat 
Glu He Gly Val Gin Pro Pro Thr Ser Ser lie Leu Glu Asp Thr Tyr 
405 410 415 



1152 



1200 



1248 



cgc tat ata gag tct cct gca act aaa tgt gca age aat gta att cct 

Arg Tyr lie Glu Ser Pro Ala Thr Lys Cys Ala Ser Asn Val He Pro 
420 425 430 

gca aaa gaa gac cct tat gca ggg ttt aag ttt tgg aac ata gat ctt 

Ala Lys Glu Asp Pro Tyr Ala Gly Phe Lys Phe Trp Asn He Asp Leu 

435 440 445 



1296 



1344 



aaa gaa aag ctt tct ttg gac tta gat caa ttt ccc ttg gga aga aga 

Lys Glu Lys Leu Ser Leu Asp Leu Asp Gin Phe Pro Leu Gly Arg Arg 

450 455 460 

ttt tta gca cag caa ggg gca gga tgt tea act gtg aga aaa cga aga 

Phe Leu Ala Gin Gin Gly Ala Gly Cys Ser Thr Val Arg Lys Arg Arg 

465 470 475 480 



1392 



1440 



att age caa aaa act tec agt aag cct gca aaa aaa aaa aaa aaa taa 
He Ser Gin Lys Thr Ser Ser Lys Pro Ala Lys Lys Lys Lys Lys 
485 490 495 



1488 



<210> 2 
<211> 495 
<212> PRT 

<213> Bovine papillomavirus type 1 



<400> 2 

Met Ala Leu Trp Gin Gin Gly Gin Lys Leu Tyr Leu Pro Pro Thr Pro 
1 5 10 15 
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Val Ser Lys Val Leu Cys Ser Glu Thr Tyr Val Gin Arg Lys Ser lie 
20 25 30 

Phe Tyr His Ala Glu Thr Glu Arg Leu Leu Thr He Gly His Pro Tyr 
35 40 45 

Tyr Pro Val Ser He Gly Ala Lys Thr Val Pro Lys Val Ser Ala Asn 
50 55 60 

Gin Tyr Arg Val Phe Lys He Gin Leu Pro Asp Pro Asn Gin Phe Ala 
65 70 75 80 

Leu Pro Asp Arg Thr Val His Asn Pro Ser Lys Glu Arg Leu Val Trp 
85 90 95 

Pro Val lie Gly Val Gin Val Ser Arg Gly Gin Pro Leu Gly Gly Thr 
100 105 110 

Val Thr Gly His Pro Thr Phe Asn Ala Leu Leu Asp Ala Glu Asn Val 
115 120 125 

Asn Arg Lys Val Thr Thr Gin Thr Thr Asp Asp Arg Lys Gin Thr Gly 
130 135 140 

Leu Asp Ala Lys Gin Gin Gin lie Leu Leu Leu Gly Cys Thr Pro Ala 
145 150 155 160 

Glu Gly Glu Tyr Trp Thr Thr Ala Arg Pro Cys Val Thr Asp Arg Leu 
165 170 175 

Glu Asn Gly Ala Cys Pro Pro Leu Glu Leu Lys Asn Lys His He Glu 
180 185 190 

Asp Gly Asp Met Met Glu He Gly Phe Gly Ala Ala Asn Phe Lys Glu 
195 200 205 

He Asn Ala Ser Lys Ser Asp Leu Pro Leu Asp He Gin Asn Glu He 
210 215 220 

Cys Leu Tyr Pro Asp Tyr Leu Lys Met Ala Glu Asp Ala Ala Gly Asn 
225 230 235 240 

Ser Met Phe Phe Phe Ala Arg Lys Glu Gin Val Tyr Val Arg His He 
245 250 255 

Trp Thr Arg Gly Gly Ser Glu Lys Glu Ala Pro Thr Thr Asp Phe Tyr 
260 265 270 

Leu Lys Asn Asn Lys Gly Asp Ala Thr Leu Lys He Pro Ser Val His 
275 280 285 

Phe Gly Ser Pro Ser Gly Ser Leu Val Ser Thr Asp Asn Gin He Phe 
290 295 300 

Asn Arg Pro Tyr Trp Leu Phe Arg Ala Gin Gly Met Asn Asn Gly He 
305 310 315 320 
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Ala Trp Asn Asn Leu Leu Phe Leu Thr Val Gly Asp Asn Thr Arg Gly 
325 330 335 

Thr Asn Leu Thr lie Ser Val Ala Ser Asp Gly Thr Pro Leu Thr Glu 
340 345 350 

Tyr Asp Ser Ser .Lys Phe Asn Val Tyr His Arg His Met Glu Glu Tyr 
355 360 365 

Lys Leu Ala Phe lie Leu Glu Leu Cys Ser Val Glu He Thr Ala Gin 
370 375 380 

Thr Val Ser His Leu Gin Gly Leu Met Pro Ser Val Leu Glu Asn Trp 
385 390 395 400 

Glu He Gly Val Gin Pro Pro Thr Ser Ser He Leu Glu Asp Thr Tyr 
405 410 415 

Arg Tyr He Glu Ser Pro Ala Thr Lys Cys Ala Ser Asn Val He Pro 
420 425 430 

Ala Lys Glu Asp Pro Tyr Ala Gly Phe Lys Phe Trp Asn He Asp Leu 
435 440 445 

Lys Glu Lys Leu Ser Leu Asp Leu Asp Gin Phe Pro Leu Gly Arg Arg 
450 455 460 

Phe Leu Ala Gin Gin Gly Ala Gly Cys Ser Thr Val Arg Lys Arg Arg 
465 470 475 480 

He Ser Gin Lys Thr Ser Ser Lys Pro Ala Lys Lys Lys Lys Lys 
485 490 495 



<210> 3 
<211> 1488 
<212> DNA 

<213> Artificial Sequence 

<220> 

<221> CDS 

<222> (1) . . (1488) 

<220> 

<223> Description of Artificial Sequence: Bovine 
papillomavirus type 1 LI open reading frame 
(humanized) 

<220> 

<223> Wild- type codons replaced with synonymous codons 
used at relatively high frequency by human genes 

<400> 3 

atg gcc ctg tgg cag cag ggc cag aag ctg tac ctg ccc cct acc ccc 48 

Met Ala Leu Trp Gin Gin Gly Gin Lys Leu Tyr Leu Pro Pro Thr Pro 
15 10 15 
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gtg age aag gtg ctt tgc agt gaa acc tat gtg caa aga aaa age att 96 
Val Ser Lys Val Leu Cys Ser Glu Thr Tyr Val Gin Arg Lys Ser lie 
20 25 30 

ttt tat cat gca gaa acg gag cgc ctg ctg acc ate gga cac ccc tat 144 
Phe Tyr His Ala Glu Thr Glu Arg Leu Leu Thr lie Gly His Pro Tyr 
35 40 45 

tac ccc gtg tec ate ggg gee aag act gtg cct aag gtg tec gee aat 192 
Tyr Pro Val Ser lie Gly Ala Lys Thr Val Pro Lys Val Ser Ala Asn 
50 55 60 

cag tat agg gtg ttc aaa ate caa ctg cct gat ccc aat caa ttt gca 240 
Gin Tyr Arg Val Phe Lys lie Gin Leu Pro Asp Pro Asn Gin Phe Ala 
65 70 75 80 

ctg cct gac agg acc gtg cac aac ccc age aaa gag egg ctg gtg tgg 2 88 
Leu Pro Asp Arg Thr Val His Asn Pro Ser Lys Glu Arg Leu Val Trp 
85 90 95 

cca gtg ate ggc gtg cag gtg tec aga ggc cag cct ctg ggc ggc acc 33 6 
Pro Val He Gly Val Gin Val Ser Arg Gly Gin Pro Leu Gly Gly Thr 
100 105 110 

gtg act ggg cac ccc act ttt aat get ttg ctt gat gca gaa aat gtg 3 84 
Val Thr Gly His Pro Thr Phe Asn Ala Leu Leu Asp Ala Glu Asn Val 
115 120 125 

aat aga aaa gtc acc acc cag acc acc gac gac agg aaa cag aca ggc 43 2 
Asn Arg Lys Val Thr Thr Gin Thr Thr Asp Asp Arg Lys Gin Thr Gly 
130 135 140 

ctg gat gee aag cag cag cag ate ctg ctg ctg ggc tgt acc cct get 4 80 
Leu Asp Ala Lys Gin Gin Gin He Leu Leu Leu Gly Cys Thr Pro Ala 
145 150 155 160 

gaa ggg gaa tat tgg aca aca gee cgt cca tgt gtg acc gac cgt eta 528 
Glu Gly Glu Tyr Trp Thr Thr Ala Arg Pro Cys Val Thr Asp Arg Leu 
165 170 175 

gaa aac ggc gee tgc cct cct ctg gag ctg aaa aac aag cac ate gaa 576 
Glu Asn Gly Ala Cys Pro Pro Leu Glu Leu Lys Asn Lys His lie Glu 
180 185 190 

gat ggg gat atg atg gaa att ggg ttt ggt gca gee aac ttc aaa gaa 624 
Asp Gly Asp Met Met Glu He Gly Phe Gly Ala Ala Asn Phe Lys Glu 
195 200 205 

att aat gca agt aaa tea gat eta cct ctg gac ate caa aat gag ate 672 
He Asn Ala Ser Lys Ser Asp Leu Pro Leu Asp He Gin Asn Glu He 
210 215 220 

tgc ctg tac ccc gac tac ctg aaa atg get gag gac gee gee ggc aac 720 
Cys Leu Tyr Pro Asp Tyr Leu Lys Met Ala Glu Asp Ala Ala Gly Asn 
225 230 235 240 
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age atg ttc ttc ttc gec agg aag gag cag gtg tac gtg aga cac ate 768 

Ser Met Phe Phe Phe Ala Arg Lys Glu Gin Val Tyr Val Arg His lie 
245 250 255 

tgg acc aga ggc ggc tec gag aaa gaa gee cct acc aca gat ttt tat 816 

Trp Thr Arg Gly Gly Ser Glu Lys Glu Ala Pro Thr Thr Asp Phe Tyr 
260 265 270 

ttg aag aac aac aag ggc gac gec acc ctg aag ate ccc age gtg cac 864 

Leu Lys Asn Asn Lys Gly Asp Ala Thr Leu Lys He Pro Ser Val His 
275 280 285 

ttc ggc age ccc age ggc tea eta gtg tec acc gac aac cag ate ttc 912 

Phe Gly Ser Pro Ser Gly Ser Leu Val Ser Thr Asp Asn Gin He Phe 
290 295 300 



aac egg ccc tac tgg ctg ttc cgc gee cag ggc atg aac aat gga att 
Asn Arg Pro Tyr Trp Leu Phe Arg Ala Gin Gly Met Asn Asn Gly He 
305 310 315 320 



aag eta gec ttc ate ctg gag ctg tgc tec gtg gag ate acc gee cag 
Lys Leu Ala Phe He Leu Glu Leu Cys Ser Val Glu He Thr Ala Gin 
370 375 380 



cgc tac ate gag tec ccc gee acc aag tgt gee age aac gtg att cct 
Arg Tyr He Glu Ser Pro Ala Thr Lys Cys Ala Ser Asn Val He Pro 
420 425 430 



960 



gee tgg aac aac ctg ctg ttc ctg acc gtg ggc gac aac aca cgt ggc 1008 
Ala Trp Asn Asn Leu Leu Phe Leu Thr Val Gly Asp Asn Thr Arg Gly 
325 330 335 

acc aac ctg acc ate age gtg gee tec gat gga acc cca ctg acc gag 1056 
Thr Asn Leu Thr He Ser Val Ala Ser Asp Gly Thr Pro Leu Thr Glu 
340 345 350 

tat gat age teg aaa ttc aac gtg tac cac aga cac atg gag gag tat 1104 
Tyr Asp Ser Ser Lys Phe Asn Val Tyr His Arg His Met Glu Glu Tyr 
355 360 365 



1152 



acc gtg tec cat ctg caa gga ctg atg ccc tec gtg ctg gag aat tgg 1200 
Thr Val Ser His Leu Gin Gly Leu Met Pro Ser Val Leu Glu Asn Trp 
385 390 395 400 

gag ate ggc gtg cag ccc ccc acc tea teg ate ttg gag gac acc tac 124 8 
Glu He Gly Val Gin Pro Pro Thr Ser Ser He Leu Glu Asp Thr Tyr 
405 410 415 



1296 



gca aaa gaa gac cct tat gca ggg ttt aag ttc tgg aac ate gac ctg 1344 
Ala Lys Glu Asp Pro Tyr Ala Gly Phe Lys Phe Trp Asn He Asp Leu 
435 440 445 

aag gag aag ctg tct ctg gac ctg gat cag ttc ccc ttg ggc aga aga 13 92 
Lys Glu Lys Leu Ser Leu Asp Leu Asp Gin Phe Pro Leu Gly Arg Arg 
450 455 460 
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ttt ctg gcc cag cag ggg gcc ggc tgt tec acc gtg aga aaa cgc agg 1440 
Phe Leu Ala Gin Gin Gly Ala Gly Cys Ser Thr Val Arg Lys Arg Arg 
465 470 475 480 



ate age cag aag acc tec age aag ccc gcc aag aag aag aaa aag taa 
lie Ser Gin Lys Thr Ser Ser Lys Pro Ala Lys Lys Lys Lys Lys 
485 490 495 



1488 



<210> 4 
<211> 495 
<212> PRT 

<213> Artificial Sequence 
<400> 4 

Met Ala Leu Trp Gin Gin Gly Gin Lys Leu Tyr Leu Pro Pro Thr Pro 
1 5 10 15 

Val Ser Lys Val Leu Cys Ser Glu Thr Tyr Val Gin Arg Lys Ser lie 
20 25 30 

Phe Tyr His Ala Glu Thr Glu Arg Leu Leu Thr lie Gly His Pro Tyr 
35 40 45 

Tyr Pro Val Ser He Gly Ala Lys Thr Val Pro Lys Val Ser Ala Asn 
50 55 60 

Gin Tyr Arg Val Phe Lys lie Gin Leu Pro Asp Pro Asn Gin Phe Ala 
65 70 75 80 

Leu Pro Asp Arg Thr Val His Asn Pro Ser Lys Glu Arg Leu Val Trp 
85 90 95 

Pro Val He Gly Val Gin Val Ser Arg^ Gly Gin Pro Leu Gly Gly Thr 
100 105 110 

Val Thr Gly His Pro Thr Phe Asn Ala Leu Leu Asp Ala Glu Asn Val 
115 120 125 

Asn Arg Lys Val Thr Thr Gin Thr Thr Asp Asp Arg Lys Gin Thr Gly 
130 135 140 

Leu Asp Ala Lys Gin Gin Gin lie Leu Leu Leu Gly Cys Thr Pro Ala 
145 150 155 160 

Glu Gly Glu Tyr Trp Thr Thr Ala Arg Pro Cys Val Thr Asp Arg Leu 
165 170 175 

Glu Asn Gly Ala Cys Pro Pro Leu Glu Leu Lys Asn Lys His He Glu 
180 185 190 

Asp Gly Asp Met Met Glu He Gly Phe Gly Ala Ala Asn Phe Lys Glu 
195 200 205 

He Asn Ala Ser Lys Ser Asp Leu Pro Leu Asp He Gin Asn Glu He 
210 215 220 
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ix 



Cys Leu Tyr Pro Asp Tyr Leu Lys Met Ala Glu Asp Ala Ala Gly Asn 
225 230 235 240 

Ser Met Phe Phe Phe Ala Arg Lys Glu Gin Val Tyr Val Arg His lie 
245 250 255 

Trp Thr Arg Gly Gly Ser Glu Lys Glu Ala Pro Thr Thr Asp Phe Tyr 
260 265 270 

Leu Lys Asn Asn Lys Gly Asp Ala Thr Leu Lys lie Pro Ser Val His 
275 280 285 

Phe Gly Ser Pro Ser Gly Ser Leu Val Ser Thr Asp Asn Gin lie Phe 
290 295 300 

Asn Arg Pro Tyr Trp Leu Phe Arg Ala Gin Gly Met Asn Asn Gly He 
305 310 315 320 

Ala Trp Asn Asn Leu Leu Phe Leu Thr Val Gly Asp Asn Thr Arg Gly 
325 330 335 

Thr Asn Leu Thr He Ser Val Ala Ser Asp Gly Thr Pro Leu Thr Glu 
340 345 350 

Tyr Asp Ser Ser Lys Phe Asn Val Tyr His Arg His Met Glu Glu Tyr 
355 360 365 

Lys Leu Ala Phe He Leu Glu Leu Cys Ser Val Glu lie Thr Ala Gin 
370 375 380 

Thr Val Ser His Leu Gin Gly Leu Met Pro Ser Val Leu Glu Asn Trp 
385 390 395 400 

Glu lie Gly Val Gin Pro Pro Thr Ser Ser He Leu Glu Asp Thr Tyr 
405 410 415 

Arg Tyr He Glu Ser Pro Ala Thr Lys Cys Ala Ser Asn Val He Pro 
420 425 430 

Ala Lys Glu Asp Pro Tyr Ala Gly Phe Lys Phe Trp Asn He Asp Leu 
435 440 445 

Lys Glu Lys Leu Ser Leu Asp Leu Asp Gin Phe Pro Leu Gly Arg Arg 
450 455 460 

Phe Leu Ala Gin Gin Gly Ala Gly Cys Ser Thr Val Arg Lys Arg Arg 
465 470 475 480 

He Ser Gin Lys Thr Ser Ser Lys Pro Ala Lys Lys Lys Lys Lys 
485 490 495 



<210> 5 

<211> 1410 

<212> DNA 

<213> Bovine papillomavirus type 1 
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<220> 

<221> CDS 

<222> (1) . . (1410) 

<220> 

<223> L2 open reading frame (wild-type) 
<400> 5 

atg agt gca cga aaa aga gta aaa cgt gcc agt gcc tat gac ctg tac 4 8 

Met Ser Ala Arg Lys Arg Val Lys Arg Ala Ser Ala Tyr Asp Leu Tyr 

1 5 10 15 

agg acc tgc aag caa gcg ggc aca tgt cca cca gat gtg ata cga aag 96 

Arg Thr Cys Lys Gin Ala Gly Thr Cys Pro Pro Asp Val lie Arg Lys 

20 25 30 

gta gaa gga gat act ata gca gat aaa att ttg aaa ttt ggg ggt ctt 144 

Val Glu Gly Asp Thr lie Ala Asp Lys He Leu Lys Phe Gly Gly Leu 
35 40 45 

gca ate tac tta gga ggg eta gga ata gga aca tgg tct act gga agg 192 

Ala He Tyr Leu Gly Gly Leu Gly He Gly Thr Trp Ser Thr Gly Arg 
50 55 60 

gtg gcc gca ggt gga tea cca agg tac aca cca etc cga aca gca ggg 240 

Val Ala Ala Gly Gly Ser Pro Arg Tyr Thr Pro Leu Arg Thr Ala Gly 
65 70 75 80 

tec aca tea teg ctt gca tea ata gga tec aga get gta aca gca ggg 2 88 

Ser Thr Ser Ser Leu Ala Ser He Gly Ser Arg Ala Val Thr Ala Gly 

85 90 95 

acc cgc ccc agt ata ggt gcg ggc att cct tta gac acc ctt gaa act 336 

Thr Arg Pro Ser He Gly Ala Gly He Pro Leu Asp Thr Leu Glu Thr 

100 105 110 

ctt ggg gcc ttg cgt cca ggg gtg tat gag gac act gtg eta cca gag 3 84 

Leu Gly Ala Leu Arg Pro Gly Val Tyr Glu Asp Thr Val Leu Pro Glu 
115 120 125 

gcc cct gca ata gtc act cct gat get gtt cct gca gat tea ggg ctt 432 

Ala Pro Ala He Val Thr Pro Asp Ala Val Pro Ala Asp Ser Gly Leu 
130 135 140 

gat gcc ctg tec ata ggt aca gac teg tec acg gag acc etc att act 4 80 

Asp Ala Leu Ser He Gly Thr Asp Ser Ser Thr Glu Thr Leu He Thr 
145 150 155 160 

ctg eta gag cct gag ggt ccc gag gac ata gcg gtt ctt gag ctg caa 528 

Leu Leu Glu Pro Glu Gly Pro Glu Asp He Ala Val Leu Glu Leu Gin 

165 170 175 

ccc ctg gac cgt cca act tgg caa gta age aat get gtt cat cag tec 576 

Pro Leu Asp Arg Pro Thr Trp Gin Val Ser Asn Ala Val His Gin Ser 

180 185 190 
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tct gca tac cac gcc cct ctg cag ctg caa teg tec att gca> gaa aca 624 
Ser Ala Tyr His Ala Pro Leu Gin Leu Gin Ser Ser lie Ala Glu Thr 
195 200 205 

tct ggt tta gaa aat att ttt gta gga ggc teg ggt tta ggg gat aca 672 
Ser Gly Leu Glu Asn lie Phe Val Gly Gly Ser Gly Leu Gly Asp Thr 
210 215 220 

gga gga gaa aac att gaa ctg aca tac ttc ggg tec cca cga aca age 720 
Gly Gly Glu Asn lie Glu Leu Thr Tyr Phe Gly Ser Pro Arg Thr Ser 
225 230 235 240 

acg ccc cgc agt att gcc tct aaa tea cgt ggc att tta aac tgg ttc 768 
Thr Pro Arg Ser lie Ala Ser Lys Ser Arg Gly lie Leu Asn Trp Phe 
245 250 255 

agt aaa egg tac tac aca cag gtg ccc acg gaa gat cct gaa gtg ttt 816 
Ser Lys Arg Tyr Tyr Thr Gin Val Pro Thr Glu Asp Pro Glu Val Phe 
260 265 270 

tea tec caa aca ttt gca aac cca ctg tat gaa gca gaa cca get gtg 864 
Ser Ser Gin Thr Phe Ala Asn Pro Leu Tyr Glu Ala Glu Pro Ala Val 
275 280 285 

ctt aag gga cct agt gga cgt gtt gga etc agt cag gtt tat aaa cct 912 
Leu Lys Gly Pro Ser Gly Arg Val Gly Leu Ser Gin Val Tyr Lys Pro 
290 295 300 

gat aca ctt aca aca cgt age ggg aca gag gtg gga cca cag eta cat 960 
Asp Thr Leu Thr Thr Arg Ser Gly Thr Glu Val Gly Pro Gin Leu His 
305 310 315 320 

gtc agg tac tea ttg agt act ata cat gaa gat gta gaa gca ate ccc 100 8 
Val Arg Tyr Ser Leu Ser Thr lie His Glu Asp Val Glu Ala lie Pro 
325 330 335 

tac aca gtt gat gaa aat aca cag gga ctt gca ttc gta ccc ttg cat 1056 
Tyr Thr Val Asp Glu Asn Thr Gin Gly Leu Ala Phe Val Pro Leu His 
340 345 350 

gaa gag caa gca ggt ttt gag gag ata gaa tta gat gat ttt agt gag 1104 
Glu Glu Gin Ala Gly Phe Glu Glu lie Glu Leu Asp Asp Phe Ser Glu 
355 360 365 

aca cat aga ctg eta cct cag aac ace tct tct aca cct gtt ggt agt 1152 
Thr His Arg Leu Leu Pro Gin Asn Thr Ser Ser Thr Pro Val Gly Ser 
370 375 380 

ggt gta cga aga age etc att cca act cga gaa ttt agt gca aca egg 12 00 
Gly Val Arg Arg Ser Leu lie Pro Thr Arg Glu Phe Ser Ala Thr Arg 
385 390 395 400 

cct aca ggt gtt gta ace tat ggc tea cct gac act tac tct get age 124 8 
Pro Thr Gly Val Val Thr Tyr Gly Ser Pro Asp Thr Tyr Ser Ala Ser 
405 410 415 
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cca gtt act gac cct gat tct acc tct cct agt eta gtt ate gat gac 1296 
Pro Val Thr Asp Pro Asp Ser Thr Ser Pro Ser Leu Val lie Asp Asp 
420 425 430 

act act act aca cca ate att ata att gat ggg cac aca gtt gat ttg 1344 
Thr Thr Thr Thr Pro lie lie lie lie Asp Gly His Thr Val Asp Leu 
435 440 445 

tac age agt aac tac acc ttg cat ccc tec ttg ttg agg aaa cga aaa 13 92 
Tyr Ser Ser Asn Tyr Thr Leu His Pro Ser Leu Leu Arg Lys Arg Lys 
450 455 460 

aaa egg aaa cat gee taa 1410 
Lys Arg Lys His Ala 
465 470 



<210> 6 
<211> 469 
<212> PRT 

<213> Bovine papillomavirus type 1 
<400> 6 

Met Ser Ala Arg Lys Arg Val Lys Arg Ala Ser Ala Tyr Asp Leu Tyr 
15 10 15 

Arg Thr Cys Lys Gin Ala Gly Thr Cys Pro Pro Asp Val lie Arg Lys 
20 25 3 0 

Val Glu Gly Asp Thr lie Ala Asp Lys lie Leu Lys Phe Gly Gly Leu 
35 40 45 

Ala lie Tyr Leu Gly Gly Leu Gly lie Gly Thr Trp Ser Thr Gly Arg 
50 55 60 

Val Ala Ala Gly Gly Ser Pro Arg Tyr Thr Pro Leu Arg Thr Ala Gly 
65 70 75 80 

Ser Thr Ser Ser Leu Ala Ser lie Gly Ser Arg Ala Val Thr Ala Gly 
85 90 95 

Thr Arg Pro Ser lie Gly Ala Gly lie Pro Leu Asp Thr Leu Glu Thr 
100 105 110 

Leu Gly Ala Leu Arg Pro Gly Val Tyr Glu Asp Thr Val Leu Pro Glu 
115 120 125 

Ala Pro Ala lie Val Thr Pro Asp Ala Val Pro Ala Asp Ser Gly Leu 
130 135 140 

Asp Ala Leu Ser lie Gly Thr Asp Ser Ser Thr Glu Thr Leu lie Thr 
145 150 155 160 

Leu Leu Glu Pro Glu Gly Pro Glu Asp lie Ala Val Leu Glu Leu Gin 
165 170 175 
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Pro Leu Asp Arg Pro Thr Trp Gin Val Ser Asn Ala Val His Gin Ser 
180 185 190 

Ser Ala Tyr His Ala Pro Leu Gin Leu Gin Ser Ser lie Ala Glu Thr 
195 200 205 

Ser Gly Leu Glu Asn lie Phe Val Gly Gly Ser Gly Leu Gly Asp Thr 
210 215 220 

Gly Gly Glu Asn lie Glu Leu Thr Tyr Phe Gly Ser Pro Arg Thr Ser 
225 230 235 240 

Thr Pro Arg Ser lie Ala Ser Lys Ser Arg Gly lie Leu Asn Trp Phe 
245 250 255 

Ser Lys Arg Tyr Tyr Thr Gin Val Pro Thr Glu Asp Pro Glu Val Phe 
260 265 270 

Ser Ser Gin Thr Phe Ala Asn Pro Leu Tyr Glu Ala Glu Pro Ala Val 
275 280 285 

Leu Lys Gly Pro Ser Gly Arg Val Gly Leu Ser Gin Val Tyr Lys Pro 
290 295 300 

Asp Thr Leu Thr Thr Arg Ser Gly Thr Glu Val Gly Pro Gin Leu His 
305 310 315 320 

Val Arg Tyr Ser Leu Ser Thr lie His Glu Asp Val Glu Ala lie Pro 
325 330 335 

Tyr Thr Val Asp Glu Asn Thr Gin Gly Leu Ala Phe Val Pro Leu His 
340 345 350 

Glu Glu Gin Ala Gly Phe Glu Glu lie Glu Leu Asp Asp Phe Ser Glu 
355 360 365 

Thr His Arg Leu Leu Pro Gin Asn Thr Ser Ser Thr Pro Val Gly Ser 
370 375 380 

Gly Val Arg Arg Ser Leu lie Pro Thr Arg Glu Phe Ser Ala Thr Arg 
385 390 395 400 

Pro Thr Gly Val Val Thr Tyr Gly Ser Pro Asp Thr Tyr Ser Ala Ser 
405 410 415 

Pro Val Thr Asp Pro Asp Ser Thr Ser Pro Ser Leu Val lie Asp Asp 
420 425 430 

Thr Thr Thr Thr Pro lie lie lie lie Asp Gly His Thr Val Asp Leu 
435 440 445 

Tyr Ser Ser Asn Tyr Thr Leu His Pro Ser Leu Leu Arg Lys Arg Lys 
450 455 460 

Lys Arg Lys His Ala 
465 
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<210> 7 
<211> 1410 
<212> DNA 

<213> Artificial Sequence 

<220> 

<221> CDS 

<222> (1) . . (1410) 

<220> 

<223> Description of Artificial Sequence: Bovine 
papillomavirus type 1 L2 open reading frame 
(humanized) 

<220> 

<223> Wild-type codons replaced with synonymous codons 
used at relatively high frequency by human genes 

<400> 7 

atg age gec cgc aag aga gtg aag cgc gec age gec tac gac ctg tac 48 

Met Ser Ala Arg Lys Arg Val Lys Arg Ala Ser Ala Tyr Asp Leu Tyr 
15 10 15 

agg acc tgc aag cag gec ggc aca tgt cca cca gat gtg ate cga aag 96 
Arg Thr Cys Lys Gin Ala Gly Thr Cys Pro Pro Asp Val lie Arg Lys 
20 25 30 

gtg gag ggc gac acc ate gec gac aag ate ctg aag ttc ggc ggc ctg 144 
Val Glu Gly Asp Thr lie Ala Asp Lys lie Leu Lys Phe Gly Gly Leu 
35 40 45 

gec ate tac ctg ggc ggc ctg ggc ate gga aca tgg tct acc ggc agg 192 
Ala lie Tyr Leu Gly Gly Leu Gly lie Gly Thr Trp Ser Thr Gly Arg 
50 55 60 

gtg gee gee ggc ggc tea cca agg tac acc cca ctg cgc acc gee ggc 24 0 
Val Ala Ala Gly Gly Ser Pro Arg Tyr Thr Pro Leu Arg Thr Ala Gly 
65 70 75 80 

tec acc tec tec ctg gee tec ate gga tec aga gec gtg acc gee ggg 288 
Ser Thr Ser Ser Leu Ala Ser lie Gly Ser Arg Ala Val Thr Ala Gly 
85 90 95 

acc cgc ccc tec ate ggc gcg ggc ate cct ctg gac acc ctg gaa act 336 
Thr Arg Pro Ser lie Gly Ala Gly lie Pro Leu Asp Thr Leu Glu Thr 
100 105 110 

ctt ggg gec ctg cgc cct ggc gtg tac gag gac acc gtg ctg ccc gaa 384 
Leu Gly Ala Leu Arg Pro Gly Val Tyr Glu Asp Thr Val Leu Pro Glu 
115 120 125 

gee cct gee ate gtg acc cct gac gec gtg cct gca gac tec ggc ctg 432 
Ala Pro Ala lie Val Thr Pro Asp Ala Val Pro Ala Asp Ser Gly Leu 
13 0 13 5 14 0 



SUBSTITUTE SHEET (RULE 26) 



WO 99/02694 PCT/AU98/00530 

xv 



gac gcc ctg tec ate ggc aca gac tec tec ace gag ace ctg ate ace 480 
Asp Ala Leu Ser lie Gly Thr Asp Ser Ser Thr Glu Thr Leu lie Thr 
145 150 155 160 

ctg ctg gag cct gag ggc ccc gaa gac ata gcc gtg ctg gaa etc cag 52 8 
Leu Leu Glu Pro Glu Gly Pro Glu Asp lie Ala Val Leu Glu Leu Gin 
165 170 175 

ccc ctg gac cgc cca acc tgg cag gtg age aat get gtg cac cag tec 57 6 
Pro Leu Asp Arg Pro Thr Trp Gin Val Ser Asn Ala Val His Gin Ser 
180 185 190 

tct gcc tac cac gcc cct etc cag ctg caa tec tec ate gcc gag aca 624 
Ser Ala Tyr His Ala Pro Leu Gin Leu Gin Ser Ser lie Ala Glu Thr 
195 200 205 

tct ggt tta gaa aat att ttt gta gga ggc teg ggt tta ggg gat acc 672 
Ser Gly Leu Glu Asn lie Phe Val Gly Gly Ser Gly Leu Gly Asp Thr 
210 215 220 

ggc ggc gag aac ate gag ctg acc tac ttc ggc tec ccc cgc acc age 720 
Gly Gly Glu Asn lie Glu Leu Thr Tyr Phe Gly Ser Pro Arg Thr Ser 
225 230 235 240 

acc ccc cgc tec ate gcc tec aag tec cgc ggc ate ctg aac tgg ttc 768 
Thr Pro Arg Ser lie Ala Ser Lys Ser Arg Gly lie Leu Asn Trp Phe 
245 250 255 

age aag egg tac tac acc cag gtg ccc acc gaa gat ccc gaa gtg ttc 816 
Ser Lys Arg Tyr Tyr Thr Gin Val Pro Thr Glu Asp Pro Glu Val Phe 
260 265 270 

tec tec cag acc ttc gcc aac ccc ctg tac gag gcc gag ccc gcc gtg 864 
Ser Ser Gin Thr Phe Ala Asn Pro Leu Tyr Glu Ala Glu Pro Ala Val 
275 280 285 

ctg aag ggc cct age ggc cgc gtg ggc ctg tec cag gtg tac aag cct 912 
Leu Lys Gly Pro Ser Gly Arg Val Gly Leu Ser Gin Val Tyr Lys Pro 
290 295 300 

gat acc ctg acc aca cgt age ggc aca gag gtg ggc ccc cag ctg cat 960 
Asp Thr Leu Thr Thr Arg Ser Gly Thr Glu Val Gly Pro Gin Leu His 
305 310 315 320 

gtg agg tac tec ctg tec acc ate cat gag gat gtg gag get ate ccc 1008 
Val Arg Tyr Ser Leu Ser Thr lie His Glu Asp Val Glu Ala lie Pro 
325 330 335 

tac acc gtg gat gag aac acc cag ggc ctg gcc ttc gtg ccc ctg cat 1056 
Tyr Thr Val Asp Glu Asn Thr Gin Gly Leu Ala Phe Val Pro Leu His 
340 345 350 

gag gag cag gcc ggc ttc gag gag ate gag etc gac gat ttc age gag 1104 
Glu Glu Gin Ala Gly Phe Glu Glu lie Glu Leu Asp Asp Phe Ser Glu 
355 360 365 
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acc cat cgc ctg ctg ccc cag aac acc tec tec acc ccc gtg ggc age 1152 
Thr His Arg Leu Leu Pro Gin Asn Thr Ser Ser Thr Pro Val Gly Ser 
370 375 380 

ggc gtg cgc aga age ctg ate cct acc cga gag ttc age gee acc egg 1200 
Gly Val Arg Arg Ser Leu lie Pro Thr Arg Glu Phe Ser Ala Thr Arg 
385 390 395 400 

cct acc ggc gtg gtg acc tac ggc tec ccc gac acc tac tec get age 124 8 
Pro Thr Gly Val Val Thr Tyr Gly Ser Pro Asp Thr Tyr Ser Ala Ser 
405 410 415 

ccc gtg acc gac cct gat tct acc tct cct age ctg gtg ate gac gac 1296 
Pro Val Thr Asp Pro Asp Ser Thr Ser Pro Ser Leu Val lie Asp Asp 
420 425 430 

acc acc acc acc ccc ate ate ate ate gac ggc cac aca gtg gat ctg 1344 
Thr Thr Thr Thr Pro lie lie lie lie Asp Gly His Thr Val Asp Leu 
435 440 445 

tac age age aac tac acc ctg cat ccc tec ctg ctg agg aag cgc aag 1392 
Tyr Ser Ser Asn Tyr Thr Leu His Pro Ser Leu Leu Arg Lys Arg Lys 
450 455 460 

aag cgc aag cat gee taa 1410 
Lys Arg Lys His Ala 
465 ~ 470 



<210> 8 
<211> 469 
<212> PRT 

<213> Artificial Sequence 
<400> 8 

Met Ser Ala Arg Lys Arg Val Lys Arg Ala Ser Ala Tyr Asp Leu Tyr 
15 10 15 

Arg Thr Cys Lys Gin Ala Gly Thr Cys Pro Pro Asp Val lie Arg Lys 
20 25 30 

Val Glu Gly Asp Thr lie Ala Asp Lys lie Leu Lys Phe Gly Gly Leu 
35 40 45 

Ala lie Tyr Leu Gly Gly Leu Gly lie Gly Thr Trp Ser Thr Gly Arg 
50 55 60 

Val Ala Ala Gly Gly Ser Pro Arg Tyr Thr Pro Leu Arg Thr Ala Gly 
65 70 75 80 

Ser Thr Ser Ser Leu Ala Ser lie Gly Ser Arg Ala Val Thr Ala Gly 
85 90 95 

Thr Arg Pro Ser lie Gly Ala Gly lie Pro Leu Asp Thr Leu Glu Thr 
100 105 110 
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Leu Gly Ala Leu Arg Pro Gly Val Tyr Glu Asp Thr Val Leu Pro Glu 
115 120 125 

Ala Pro Ala lie Val Thr Pro Asp Ala Val Pro Ala Asp Ser Gly Leu 
130 135 140 

Asp Ala Leu Ser He Gly Thr Asp Ser Ser Thr Glu Thr Leu He Thr 
145 150 155 160 

Leu Leu Glu Pro Glu Gly Pro Glu Asp He Ala Val Leu Glu Leu Gin 
165 170 175 

Pro Leu Asp Arg Pro Thr Trp Gin Val Ser Asn Ala Val His Gin Ser 
180 185 190 

Ser Ala Tyr His Ala Pro Leu Gin Leu Gin Ser Ser lie Ala Glu Thr 
195 200 205 

Ser Gly Leu Glu Asn He Phe Val Gly Gly Ser Gly Leu Gly Asp Thr 
210 215 220 

Gly Gly Glu Asn He Glu Leu Thr Tyr Phe Gly Ser Pro Arg Thr Ser 
225 230 235 240 

Thr Pro Arg Ser He Ala Ser Lys Ser Arg Gly He Leu Asn Trp Phe 
245 250 255 

Ser Lys Arg Tyr Tyr Thr Gin Val Pro Thr Glu Asp Pro Glu Val Phe 
260 265 270 

Ser Ser Gin Thr Phe Ala Asn Pro Leu Tyr Glu Ala Glu Pro Ala Val 
275 280 285 

Leu Lys Gly Pro Ser Gly Arg Val Gly Leu Ser Gin Val Tyr Lys Pro 
290 295 300 

Asp Thr Leu Thr Thr Arg Ser Gly Thr Glu Val Gly Pro Gin Leu His 
305 310 315 320 

Val Arg Tyr Ser Leu Ser Thr He His Glu Asp Val Glu Ala He Pro 
325 330 335 

Tyr Thr Val Asp Glu Asn Thr Gin Gly Leu Ala Phe Val Pro Leu His 
340 345 350 

Glu Glu Gin Ala Gly Phe Glu Glu He Glu Leu Asp Asp Phe Ser Glu 
355 360 365 

Thr His Arg Leu Leu Pro Gin Ash Thr Ser Ser Thr Pro Val Gly Ser 
370 375 380 

Gly Val Arg Arg Ser Leu He Pro Thr Arg Glu Phe Ser Ala Thr Arg 
385 390 395 400 

Pro Thr Gly Val Val Thr Tyr Gly Ser Pro Asp Thr Tyr Ser Ala Ser 
405 410 415 
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Pro Val Thr Asp Pro Asp Ser Thr Ser Pro Ser Leu Val lie Asp Asp 
420 425 430 

Thr Thr Thr Thr Pro lie lie lie lie Asp Gly His Thr Val Asp Leu 
435 440 445 

Tyr Ser Ser Asn Tyr Thr Leu His Pro Ser Leu Leu Arg Lys Arg Lys 
450 455 460 

Lys Arg Lys His Ala 
465 



<210> 9 
<211> 717 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Aequorea 
victoria gfp gene (humanized) 

<220> 

<221> CDS 

<222> (1) . . (717) 

<400> 9 

atg age aag ggc gag gaa ctg ttc act ggc gtg gtc cca att etc gtg 4 8 

Met Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro lie Leu Val 
15 10 15 

gaa ctg gat ggc gat gtg aat ggg cac aaa ttt tct gtc age gga gag 96 
Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly Glu 
20 25 30 

ggt gaa ggt gat gee aca tac gga aag etc ace ctg aaa ttc ate tgc 144 
Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe lie Cys 
3 5 40 45 

acc act gga aag etc cct gtg cca tgg cca aca ctg gtc act ace ttc 192 
Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr Phe 
50 55 60 

tct tat ggc gtg cag tgc ttt tec aga tac cca gac cat atg aag cag 240 
Ser Tyr Gly Val Gin Cys Phe Ser Arg Tyr Pro Asp His Met Lys Gin 
65 70 75 80 

cat gac ttt ttc aag age gee atg ccc gag ggc tat gtg cag gag aga 2 88 
His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Glu Arg 
85 90 95 

acc ate ttt ttc aaa gat gac ggg aac tac aag acc cgc get gaa gtc 336 
Thr lie Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu Val 
100 105 110 

aag ttc gaa ggt gac acc ctg gtg aat aga ate gag ctg aag ggc att 3 84 
Lys Phe Glu Gly Asp Thr Leu Val Asn Arg lie Glu Leu Lys Gly lie 
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115 120 125 

gac ttt aag gag gat gga aac att etc ggc cac aag ctg gaa tac aac 432 
Asp Phe Lys Glu Asp Gly Asn lie Leu Gly His Lys Leu Glu Tyr Asn 
130 135 140 

tat aac tec cac aat gtg tac ate atg gee gac aag caa aag aat ggc 4 80 
Tyr Asn Ser His Asn Val Tyr lie Met Ala Asp Lys Gin Lys Asn Gly 
145 150 155 160 

ate aag gtc aac ttc aag ate aga cac aac att gag gat gga tec gtg 528 
lie Lys Val Asn Phe Lys lie Arg His Asn lie Glu Asp Gly Ser Val 
165 170 175 

cag ctg gee gac cat tat caa cag aac act cca ate ggc gac ggc cct 576 
Gin Leu Ala Asp His Tyr Gin Gin Asn Thr Pro lie Gly Asp Gly Pro 
180 185 190 

gtg etc etc cca gac aac cat tac ctg tec acc cag tct gee ctg tct 624 
Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gin Ser Ala Leu Ser 
195 200 205 

aaa gat ccc aac gaa aag aga gac cac atg gtc ctg ctg gag ttt gtg 672 
Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe Val 
210 215 220 

acc get get ggg ate aca cat ggc atg gac gag ctg tac aag tga 717 
Thr Ala Ala Gly lie Thr His Gly Met Asp Glu Leu Tyr Lys 
225 230 235 



<210> 10 
<211> 238 
<212> PRT 

<213> Artificial Sequence 
<400> 10 

Met Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro lie Leu Val 
1 5 10 15 

Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly Glu 
20 25 30 

Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe lie Cys 
35 40 45 

Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr Phe 
50 55 60 

Ser Tyr Gly Val Gin Cys Phe Ser Arg Tyr Pro Asp His Met Lys Gin 
65 70 75 80 

His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Glu Arg 
85 90 95 

Thr lie Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu Val 
100 105 110 
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Lys Phe Glu Gly Asp Thr Leu Val Asn Arg He Glu Leu Lys Gly He 
115 12 0 125 

Asp Phe Lys Glu Asp Gly Asn He Leu Gly His Lys Leu Glu Tyr Asn 
130 135 140 

Tyr Asn Ser His Asn Val Tyr He Met Ala Asp Lys Gin Lys Asn Gly 
145 150 155 160 

He Lys Val Asn Phe Lys He Arg His Asn He Glu Asp Gly Ser Val 
165 170 175 

Gin Leu Ala Asp His Tyr Gin Gin Asn Thr Pro He Gly Asp Gly Pro 
180 185 190 

Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gin Ser Ala Leu Ser 
195 200 205 

Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe Val 
210 215 220 

Thr Ala Ala Gly He Thr His Gly Met Asp Glu Leu Tyr Lys 
225 230 235 



<210> 11 
<211> 717 
<212> DNA 

<213> Artificial Sequence 

<220> 

<221> CDS 

<222> (1) . . (717) 

<220> 

<223> Description of Artificial Sequence: Synthetic gfp 
gene (Papillomavirusized) 

<220> 

<223> Codons of humanized gfp gene replaced with 
synonymous codons used at relatively high 
frequency by papillomavirus genes 

<400> 11 

atg agt aaa ggg gaa gaa eta ttt aca ggg gtg gtg cct ata eta gtg 48 
Met Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro He Leu Val 
15 10 15 

gaa eta gat ggg gat gtg aat ggg cac aaa ttt tct gtc agt ggg gaa 96 
Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly Glu 
20 25 30 

999 9 aa 999 9 at 9 ca aca tat 999 aaa cta aca cta aaa tt:t ata t 9 c I 44 
Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe He Cys 
35 40 45 
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aca aca ggg aaa eta cct gtg cca tgg cct aca eta gtg aca aca ttt 192 
Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr Phe 
50 55 60 



agt tat ggg gtg caa tgc ttt agt aga tat cct gat cat atg aaa caa 
Ser Tyr Gly Val Gin Cys Phe Ser Arg Tyr Pro Asp His Met Lys Gin 
65 70 75 80 



240 



cat gat ttt ttt aaa agt gca atg ccc gag ggg tat gtg caa gaa aga 288 
His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Glu Arg 
85 90 95 

aca ata ttt ttt aaa gat gat ggg aat tat aaa aca aga gca gaa gtc 336 
Thr lie Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu Val 
100 105 110 

aaa ttt gaa ggg gat aca eta gtg aat aga ata gag etc aaa ggg ata 3 84 
Lys Phe Glu Gly Asp Thr Leu Val Asn Arg lie Glu Leu Lys Gly lie 
115 120 125 

gat ttt aaa gaa gat ggg aat ata eta ggg cat aaa eta gaa tat aat 432 
Asp Phe Lys Glu Asp Gly Asn lie Leu Gly His Lys Leu Glu Tyr Asn 
130 135 140 

tat aat agt cat aat gtg tat ata atg gca gat aaa caa aaa aat ggg 4 80 
Tyr Asn Ser His Asn Val Tyr lie Met Ala Asp Lys Gin Lys Asn Gly 
145 150 155 160 

ata aaa gtg aat ttt aaa ata ata aga cat ata gaa gat gga tec gtg 52 8 
lie Lys Val Asn Phe Lys lie lie Arg His lie Glu Asp Gly Ser Val 
165 170 175 

caa eta gca gat cat tat caa caa aat aca cct ata ggg gat ggg cct 576 
Gin Leu Ala Asp His Tyr Gin Gin Asn Thr Pro lie Gly Asp Gly Pro 
180 185 , 190 

gtg eta eta cct gat aac cat tat eta agt aca caa agt gca eta agt 624 
Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gin Ser Ala Leu Ser 
195 200 205 

aaa gat cct aat gaa aaa aga gat cat atg gtg eta etc gag ttt gtg 672 
Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe Val 
210 215 220 

aca gca gca ggg ata aca cat ggg atg gat gaa eta tat aaa tga 717 
Thr Ala Ala Gly lie Thr His Gly Met Asp Glu Leu Tyr Lys 
225 230 235 



<210> 12 
<211> 238 
<212> PRT 

<213> Artificial Sequence 
<400> 12 

Met Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro lie Leu Val 
15 10 15 
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Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly Glu 
20 25 30 

Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe lie Cys 
35 * 40 45 

Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr Phe 
. 50 55 60 

Ser Tyr Gly Val Gin Cys Phe Ser Arg Tyr Pro Asp His Met Lys Gin 
65 70 75 80 

His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Glu Arg 
85 90 95 

Thr lie Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu Val 
100 105 110 

Lys Phe Glu Gly Asp Thr Leu Val Asn Arg lie Glu Leu Lys Gly lie 
115 120 125 

Asp Phe Lys Glu Asp Gly Asn lie Leu Gly His Lys Leu Glu Tyr Asn 
130 135 140 

Tyr Asn Ser His Asn Val Tyr lie Met Ala Asp Lys Gin Lys Asn Gly 
145 150 155 160 

lie Lys Val Asn Phe Lys lie lie Arg His lie Glu Asp Gly Ser Val 
165 170 175 

Gin Leu Ala Asp His Tyr Gin Gin Asn Thr Pro lie Gly Asp Gly Pro 
180 185 190 

Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gin Ser Ala Leu Ser 
195 200 205 

Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe Val 
210 215 220 

Thr Ala Ala Gly lie Thr His Gly Met Asp Glu Leu Tyr Lys 
225 230 235 



<210> 13 

<211> 17 

<212> DNA 

<213> Artificial Sequence 



<220> 
<223> 



Description of Artificial Sequence: 
Oligonucleotide specific for Ala (GCA) 



<400> 13 

taaggactgt aagactt 



17 
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<210> 14 

<211> 17 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Arg(CGA) 

<400> 14 

cgagccagcc aggagtc 



<210> 15 

<211> 17 

<212> DNA 

<213> Artificial 



Sequence 



<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Asn(AAC) 

<400> 15 

ctagattggc aggaatt 



<210> 16 

<211> 17 

<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Asp (GAC) 

<400> 16 

taagatatat agattat 



<210> 17 

<211> 17 

<212> DNA 

<213> Artificial 



Sequence 



<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Cys (TGC) 

<400> 17 

aagtcttagt agagatt 



<210> 18 

<211> 17 

<212> DNA 

<213> Artificial 



Sequence 
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<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Glu (GAA) 



<400> 18 

tatttctaca cagcatt 



17 



<210> 19 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Gin (CAA) 

<400> 19 

ctaggacaat aggaatt 17 



<210> 20 

<211> 17 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Gly (GGA) 



<210> 21 

<211> 17 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for His (CAC) 

<400> 21 

tgccgtgact cggattc 17 



<210> 22 

<211> 17 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence-. 

Oligonucleotide specific for lie (ATC) 



<400> 20 

tactctcttc tgggttt 



17 



<400> 22 
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tagaaataag agggctt 



17 



<210> 23 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Leu (CTA) 

<400> 23 

tacttttatt tggattt 17 



<210> 24 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Leu(CTT) 



<210> 25 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Lys (AAA) 

<400> 25 

tcactatgga gatttta 17 



<210> 26 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence; 

Oligonucleotide specific for Lys (AAG) 



<400> 24 

tattagggag aggattt 



17 



<400> 26 

cgcccaacgt ggggctc 



17 



<210> 27 
<211> 17 
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<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Met (elong) 

<400> 27 

tagtacggga aggattt 17 

<210> 28 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Phe (TTC) 

<400> 28 

tgtttatggg atacaat 17 



<210> 29 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Pro (CCA) 

<400> 29 

tcaagaagaa ggagcta 17 



<210> 30 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Pro(CCI) 

<400> 30 

gggctcgtcc gggattt 17 



<210> 31 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence; 



SUBSTITUTE SHEET (RULE 26) 



r 



WO 99/02694 xxvii PCT/AU98/00530 



Oligonucleotide specific for Ser (AGC) 
<400> 31 

ataagaaagg aagatcg 17 



<210> 32 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Thr (ACA) 

<400> 32 

tgtcttgaga agagaag 17 

<210> 33 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Tyr(TAC) 

<400> 33 

tggtaaaaag aggattt 17 

<210> 34 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Val (GTA) 

<400> 34 

tcagagtgtt cattggt 17 
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